巴西专利BR112020006544A2 automated classification and taxonomy of 3d tooth data using deep learning methods

专利PDF首页>>巴西专利

专利附录

专利说明

权利要求

类似技术

同族专利

引用文献

法律状态

优先权

专利摘要:
The present invention relates to a computer-implemented method for automated classification of 3D image data of teeth comprising: a computer receiving one or more of the 3D image data sets, a 3D image data set defining an image volume of the voxels, the voxels representing 3D tooth structures in the image volume, the image volume being associated with a 3D coordinate system; the computer preprocessing each of the 3D image data sets; and the computer provides each of the preprocessed 3D image data sets for the input of a trained deep neural network and the trained deep neural network classifies each of the voxels into a 3D image data set based on a plurality of dentition candidate tooth labels, where the classification of a 3D image data set includes generating, for at least part of the voxels of the 3D image data set, an activation value of the candidate tooth label, an activation value associated with a candidate tooth label defining the probability that the labeled data point represents a type of tooth indicated by the candidate tooth label.
公开号:BR112020006544A2
申请号:R112020006544-7
申请日:2018-10-02
公开日:2020-09-29
发明作者:David Anssari Moin；Frank Theodorus Catharina Claessen；Bas Alexander Verheij
申请人:Promaton Holding B.V；
IPC主号:

专利说明:

[001] [001] The invention relates to the automated location, classification and taxonomy of 3D tooth data using deep learning methods, and in particular, though not exclusively, to systems and methods for automated location, classification and taxonomy of 3D tooth data using deep learning methods, a method for training such a deep learning neural network, and a computer program product for using such a method. BACKGROUND OF THE TECHNIQUE
[002] [002] The reliable identification of tooth types and tooth arrangements plays a very important role in a wide range of applications, including (but not limited to) dental care and dental reporting, orthodontics, orthognathic surgery, forensic investigation and biometrics. Therefore, several computer aided techniques have been developed to automate or at least partially automate the process of classifying and numbering teeth according to a known dental notation scheme. In addition, any reduction in the time required to reliably classify and tax teeth will be beneficial in such fields of application.
[003] [003] For the purpose of this disclosure, tooth 'refers to the integral of a tooth, including crown and root, teeth' refer to any set of teeth consisting of two or more teeth, while a set of teeth originated from of a single person will be referred to as originating from a “teething. A dentition may not necessarily contain an individual's total set of teeth. In addition, “classification” refers to the identification to which of a set of categories an observation or sample belongs. In the case of dental taxonomy, classification refers to the identification process to which category (or label) an individual tooth belongs. “Taxonomy 'refers to the process of deriving a tooth class for all individual teeth from a single dentition and 3D tooth data refers to any digital representation of any (set of) teeth, for example, a representation of 3D voxel of a filled volume, densities in a volume, a 3D surface mesh, etc. In addition, 3D teeth data representing a dentition can either include a complete set of teeth or part of a complete set. Unless stated differently in this application, the term “segmentation” refers to semantic segmentation, which refers to dense predictions for each voxel, so that each voxel of the input space is labeled with a certain class of object. Unlike bounding box segmentation, which refers to the discovery of region boundaries, semantic segmentation produces semantically interpretable 3D masks in the input data space.
[004] [004] For example, US2017 / 0169562 describes a system for automatic tooth type recognition based on intraoral 3D optical scans. An intraoral optical scanner like this is capable of generating a 3D scan of the exposed parts of the teeth, that is, the crown of the teeth. The shape of each crown is derived from 3D scanning and represented as a 3D mesh, including faces and vertices. These 3D meshes are subsequently used to determine the aggregate resources for each tooth. The thus obtained aggregate resources and the associated tooth type are then used as training data to train classifiers using traditional machine learning methodologies, such as support vector machines or decision trees.
[005] [005] Although this system is capable of processing high resolution intraoral 3D scans as input data, it is not capable of processing volumetric dento-maxillofacial images that are generated using Conical Beam Computed Tomography (CBCT). CBCT is a medical imaging technique that uses X-ray computed tomography, in which the X-ray radiation is modeled on a divergent, low-dose cone. CBCT imaging is the most widely used 3D imaging technique in the dental field and generates 3D image data of the dento-maxillofacial structures, which may include (parts of) jaw bones, complete or partial dental structures, including the crown and roots, and the lower alveolar nerve (parts). Image analysis of CBCT image data, however, poses a substantial problem, since, in CBCT scans, the radio density, measured in Hounsfield Units (HUs), is not consistent because different areas in the scan appear with different gray scale values, depending on their relative positions in the organ being scanned. The HUs measured from the same anatomical area with both CBCT and medical grade CT scanners are not identical and thus are not reliable for determining site-specific radiographically identified bone density.
[006] [006] Furthermore, the CBCT systems for scanning the dento-manxilofacial structures do not employ a standardized system for scaling the gray levels that represent the reconstructed density values. These values are, as such, arbitrary and do not allow, for example, the evaluation of bone quality. In the absence of a standardization like this, it is difficult to interpret the gray levels or impossible to compare the resulting values from different machines. In addition, the root structures of teeth and jaw bone have similar densities, so that it is difficult for a computer, for example, to distinguish between voxels that belong to teeth and voxels that belong to a jaw. Additionally, CBCT systems are very sensitive to the so-called beam hardening, which produces dark bands between two high-attenuation objects (such as metal or bone), with surrounding bright bands. The aforementioned problems make the complete automatic segmentation of dento-maxillofacial structures and the classification of segmented dental structures, and, more generally, the automated taxonomy of 3D tooth data derived from 3D CBCT image data particularly challenging.
[007] [007] This problem is, for example, discussed and illustrated in the article by Miki et al,
[008] [008] In their article, Miki et al suggested that accuracy can be improved by using CNN 3D layers instead of CNN 2D layers. Taking the same architectural principles as the neural network, however, converting them to a 3D variant will lead to compromises in relation to the granularity of the data, in particular, the maximum resolution (for example, mm represented by data point, being a 2D pixel or a 3D voxel) in applicable orthogonal directions. Considering the computational requirements, in particular, the memory bandwidth required for processing, such 3D bounding box voxels will have a considerably lower resolution than would be possible for 2D bounding boxes of a 2D axial slice. Thus, the benefit of having information available for the full 3D volume containing a tooth will, in practice, be underestimated by removing the information due to a downward sampling of the image that is necessary to process the voxels based on a reasonable computational load. . Especially in problematic regions of 3D CBCT data, such as, for example, transitions between individual teeth or between bone and teeth, this will negatively affect a sufficiently accurate classification result per tooth.
[009] [009] When the previously discussed article by Miki et al considers automatic classification based on manual segmentation of the bounding box, automatic segmentation of the bounding box (therefore, the location of the tooth and the possibility of complete automatic classification) is addressed in a latest article published by the same authors, Miki et al, “Tooth labeling in cone-beam CT using deep convolutional neural network for forensic identification”, Progress in Biomedical Optics and Imaging 10134 (2017) pp. 101343E-1 - 10134E-6. In this article, again in the 2D domain of axial slices from CBCT scans, a convolutional neural network is trained and used to produce a heat map that indicates the probability per pixel of belonging to a region of the tooth. This heat map is filtered and the 2D bounding boxes containing the teeth are selected using a non-maximum suppression method. Positive and negative example bounding boxes are used to train a convolutional neural network referenced in its first article discussed, and a trained network was evaluated. Again, when trying to adapt this methodology to work on 3D image data, the same considerations exposed need to be considered. Converting the same architectural principles from the neural network to a 3D CNN to generate a 3D heat map will result in a significantly long processing time. In addition, also in this case, the need for top-down sampling in order to deal with bandwidth limitations will have a negative impact on segmentation and classification performance.
[010] [010] Thus, trying to extrapolate the 2D case described by Miki et al to a 3D case will lead, in almost all cases, to bounding boxes (voxels) with an accuracy lower than what would be possible from the resolution of the data set original input, thereby significantly reducing the accurate prediction of the trust of a pixel to be part of a region of the tooth (for example, a pixel that is part of a slice that contains jaw bone, but, incidentally “looks like 'a tooth can be incorrectly assigned with a high tooth confidence). The low resolution will have, in particular, consequences for the accuracy of the classification results. The network receives little to no information considering neighboring tooth, tissue, bone structures, etc. Such information will be highly valuable, not only to determine the seven tooth classes researched by Miki et al, but it will also yield potential superior classification accuracy considering all 32 types of individual teeth that may be present in a healthy adult dentition.
[011] [011] Additionally, when a tooth is only partially present in a received 3D image, as is often the case with quadrant CBCT scans where parts of a tooth are beyond the field of view of a scanning device, the fact that that networks trained on images containing complete teeth will again be detrimental to both the identification of a region of the tooth and the classification of a tooth.
[012] [012] The aforementioned problems make the realization of a system that is capable of complete automated location, classification and taxonomy of 3D tooth data under certain computational constraints very challenging, especially if the automated taxonomy of 3D tooth data is based on volumetric 3D CBCT data.
[013] [013] Therefore, there is a need in technology for computer systems that are adapted to precisely locate, classify and taxonomize 3D tooth data sets, in particular 3D tooth data derived from volumetric 3D CBCT image data heterogeneous, in individual tooth types. In particular, there is a need in technology for computer systems that are adapted to precisely locate, classify and taxonomize 3D tooth data in tooth types precisely and in a timely manner in a data structure that links the data sets they represent 3D teeth to objects corresponding to the 32 possible teeth of an adult. SUMMARY OF THE INVENTION
[014] [014] As will be appreciated by those skilled in the art, aspects of the present invention can be incorporated as a computer program system, method or product. In this way, aspects of the present invention can take the form of an integrally hardware modality, an integrally software modality (including embedded software, resident software, microcode, etc.) or a modality that combines software and hardware aspects that can, all ,
[015] [015] Any combination of one or more computer-readable media (s) may be used. Computer-readable media can be computer-readable signal media or computer-readable storage media. A computer-readable storage medium can be, for example, but without limitation, an electronic, magnetic, optical, electromagnetic, infrared or semiconductor device, system or device, or any suitable combination of those exposed. More specific examples (a non-exhaustive list) of computer-readable storage media will include the following: an electrical connection that has one or more wires, a portable floppy disk, a hard drive, a random access memory (RAM), an exclusive read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, an exclusive read-only compact disc (CD-ROM) memory, an optical storage device, a storage device magnetic storage or any suitable combination of those exposed. In the context of this document, a computer-readable storage medium can be any tangible medium that can contain, or store, a program for use by or in connection with a system, device, or instruction execution device.
[016] [016] A computer-readable signal medium may include a data signal propagated with computer-readable program code embedded in it, for example, in baseband or as part of a carrier wave. A signal propagated like this can take any of a variety of forms,
[017] [017] The program code embedded in a computer-readable medium can be transmitted using any appropriate medium, including, but not limited to, wireless, wired, optical fiber, cable, RF, etc., or any suitable combination of those exposed. . The computer program code to perform operations for aspects of the present invention can be written in any combination of one or more programming languages, including a functional or an object-oriented programming language, such as Java (TM), Scala, C ++, Python or the like, and conventional procedural programming languages, such as the “C” programming language, or similar programming languages. The program code can run entirely on the user's computer, partially on the user's computer, as a standalone software package, partially on the user's computer and partially on a remote computer, or entirely on the remote computer, server or virtualized server. In the latter scenario, the remote computer can be connected to the user's computer through any type of network, including a local area network (LAN) or a wide area network (WAN), or the connection can be made to an external computer (for example, over the Internet using an Internet Service Provider).
[018] [018] Aspects of the present invention are described below in relation to flowchart illustrations and / or block diagrams of methods, apparatus (systems), and computer program products according to the modalities of the invention. It will be understood that each block of the flowchart illustrations and / or block diagrams, and combinations of blocks in the flowchart illustrations and / or block diagrams, can be implemented by computer program instructions. These computer program instructions can be provided for a processor, in particular, a microprocessor or central processing unit (CPU), or graphics processing unit (GPU), for a general purpose computer, special purpose computer, or other programmable data processing apparatus to produce a machine, in such a way that the instructions, which execute through the computer's processor, another programmable data processing apparatus or other devices, create means to implement the functions / acts specified in block or blocks of the flowchart and / or block diagram.
[019] [019] These computer program instructions can also be stored on computer-readable media that can target a computer, another programmable data processing device, or other devices to function in a particular way, such that the instructions stored on computer-readable media produce a manufacturing article that includes instructions that implement the function / act specified in the block or blocks of the flowchart and / or block diagram.
[020] [020] Computer program instructions can also be loaded onto a computer, another programmable data processing device, or other devices to cause a series of operational steps to be performed on the computer, another programmable device or other devices to produce a process implemented on a computer in such a way that the instructions that execute on the computer or other programmable device provide processes to implement the functions / acts specified in the block or blocks of the flowchart and / or block diagram.
[021] [021] The flowchart and block diagrams in the figures illustrate the architecture, functionality and operation of possible implementations of computer program systems, methods and products in accordance with various modalities of the present invention. In this regard, each block in the flowchart or block diagrams can represent a module, segment, or part of code, which comprises one or more executable instructions to implement the specified logical function (s). It should also be noted that, in some alternative implementations, the functions noted in the blocks may occur out of the order noted in the figures. For example, two blocks shown in succession can, in fact, be executed substantially concurrently, or the blocks can sometimes be executed in reverse order, depending on the functionality involved. It will also be noted that each block in the block diagrams and / or flowchart illustrations, and combinations of blocks in the block diagrams and / or flowchart illustrations, can be implemented by special purpose hardware-based systems that perform the functions or acts specified, or combinations of special-purpose hardware and computer instructions.
[022] [022] In the first aspect, the invention can refer to a computer-implemented method for processing 3D data that represents a dento-maxillofacial structure. The method may comprise: a computer receiving 3D data, preferably CT data in 3D conical beam, CBCT, 3D data including a voxel representation of the dento-maxillofacial structure, the dento-maxillofacial structure comprising a dentition, a voxel at least being associated with a radiation intensity value, the voxels of the voxel representation set an image volume; the computer provides the voxel representation for the entry of a first 3D deep neural network, the 3D deep neural network being trained to classify the voxels of the voxel representation into one or more classes of tooth, preferably in at least 32 classes of tooth dentition tooth; the first deep neural network comprising a plurality of first 3D convolutional layers that define a first convolution path and a plurality of second 3D convolutional layers that define a second convolutional path parallel to the first convolutional path, the first convolutional path configured to receive at your entrance a first block of voxels from the voxel representation and the second convolutional path being configured to receive a second block of voxels from the voxel representation, the first and second blocks of voxels having the same or substantially the same central point in the volume of the image and the second block of voxels representing a volume in dimensions of the real world that is greater than the volume in dimensions of the real world of the first block of voxels, the second convolutional path determining contextual information for the voxels of the first block of voxels; the output of the first and second convolutional paths being connected in at least one layer completely connected to classify the voxels of the first block of voxels in one or more classes of tooth; and, the computer receives voxels classified from the voxel representation of the dento-maxillofacial structure from the output of the first deep neural network in 3D.
[023] [023] By using a 3D neural network architecture like this, individual tooth classification can be both trained and inferred using as much information relevant to the problem as possible, at appropriate scales, providing modern hardware limitations. Not only is it highly effective to consider both the location of the tooth structure (which produces a semantic segmentation in the native resolution of the 3D data received) and the classification of such a structure (exclusively classifying each tooth that may be present exclusively in a healthy adult dentition) ), it is also effective to consider the required duration due to its ability to process a multiplicity of output voxels in parallel. Due to the nature in which the samples are offered to the 3D deep neural network, the classification can also be performed on teeth only partially present in a scan.
[024] [024] In one embodiment, the volume of the second block of voxels may be greater than the volume of the first block of voxels, the second block of voxels representing a downwardly sampled version of the first block of voxels, preferably the downward sampling factor being selected between 20 and 2, more preferably between 10 and 3.
[025] [025] In one embodiment, the method may additionally comprise: the computer determining one or more voxel representations of the individual tooth of the dento-maxillofacial structure based on the classified voxels; the computer provides each of the one or more voxel representations of the individual tooth for the entry of a second 3D deep neural network, the second 3D deep neural network being trained to classify a voxel representation of an individual tooth into one of a plurality of tooth classes in a dentition, each tooth class being associated with a candidate tooth class label, the second trained 3D neural network generating, for each of the candidate tooth class labels, an activation value, a activation value associated with a candidate tooth class label defining the probability that a voxel representation of an individual tooth represents a tooth class indicated by the candidate tooth class label.
[026] [026] In one embodiment, the method may further comprise: determining a dentition taxonomy that includes: defining candidate dentition states, each candidate state being formed by assigning a candidate tooth class label to each of a plurality of representations of individual tooth voxel based on activation values; and, evaluate candidate dentition states based on one or more conditions, at least one of the one or more conditions requiring different candidate tooth class labels assigned with different voxel representations of the individual tooth.
[027] [027] In one embodiment, the method may further comprise: the computer using a pre-processing algorithm to determine the 3D positional resource information of the dento-maxillofacial structure, the 3D positional resource information defining, for each voxel in the voxel representation, information about the position of the voxel in relation to the position of a dental reference object, for example, a mandible, a dental arch and / or one or more teeth, in the image volume; and the computer adds the 3D positional resource information to the 3D data before providing the 3D data for the input of the first deep neural network, the 3D positional resource information added providing an additional data channel for the 3D data.
[028] [028] In one embodiment, the method may further comprise: the computer post-processing the voxels classified by the first 3D deep neural network based on a third trained neural network, the third deep neural network being trained to receive voxels that are classified by the first deep neural network at its entrance and to correct voxels that are incorrectly classified by the first deep neural network, preferably the third neural network being trained based on the voxels that are classified during the training of the first deep neural network as input and based on the one or more 3D data sets of parts of the dento-maxillofacial structures of the 3D image data of the training set as a target.
[029] [029] In a further aspect, the invention relates to a method for training a system of the deep neural network to process 3D image data of a dento-maxillofacial structure. The method may include a computer receiving training data, training data including: 3D input data, preferably 3D cone beam CT (CBCT) data, 3D input data defining one or more voxel representations of a or more dento-maxillofacial structures, respectively, a voxel being associated with a radiation intensity value, the voxels of a voxel representation defining an image volume; and the training data additionally including: 3D data sets of parts of the dento-maxillofacial structures represented by the 3D input data of the training data; the computer uses a pre-processing algorithm to determine the 3D positional resource information of the dento-maxillofacial structure, the 3D positional resource information defining, for each voxel in the voxel representation, the information about the position of the voxel in relation to the position of a dental reference object, for example, a mandible, a dental arch and / or one or more teeth, in the image volume; and use training data and one or more 3D positional resources to train the first deep neural network to classify voxels in one or more tooth classes, preferably in at least 32 teeth in a dentition.
[030] [030] In one embodiment, the method may additionally comprise: using voxels that are classified during the training of the first deep neural network and the one or more 3D data sets of parts of the dento-maxillofacial structures of the 3D image data of the training set to train a second neural network to post-process voxels classified by the first deep neural network, where post-processing by the third neural network includes correcting voxels that are incorrectly classified by the first deep neural network.
[031] [031] In one embodiment, the method may comprise: using 3D data sets, which are voxel representations of individual teeth to be used as targets for training at least the first deep neural network, to select at least a subset of voxels from the 3D image data that is used as training input on the first deep neural network, the subset being used as input for training a third deep neural network; and use the tooth class label as associated with the 3D data set that serves as a target for training at least the first deep neural network as the target tooth class label for training the third deep neural network.
[032] [032] In one aspect, the method may refer to a computer system, preferably a server system, adapted to automatically classify the 3D image data of the teeth comprising: a computer-readable storage medium that has readable program code by computer incorporated in it, the program code including a classification algorithm and a deep neural network, the computer-readable program code; and a processor, preferably a microprocessor, coupled to the computer-readable storage media, where, responsive to the execution of the first computer-readable program code, the processor is configured to perform the executable operations that include: receiving 3D image data, preferably CT image data in 3D conical beam (CBCT), 3D image data defining a voxel image volume, a voxel being associated with an intensity value or a radiation density value, the voxels defining a 3D representation the dento-maxillofacial structure in the image volume, the dento-maxillofacial structure including a dentition; a trained deep neural network to receive 3D image data at its input and classify at least part of the voxels in the image volume in one or more tooth classes, preferably in at least 32 tooth classes of a dentition.
[033] [033] In one aspect, the invention may refer to a computer, preferably a server system, adapted to automatically taxonomize the 3D image data of the teeth comprising: a computer-readable storage medium that has computer-readable program code embedded in it, the program code including a taxonomy algorithm and a trained deep neural network, the computer-readable program code; and a processor, preferably a microprocessor, coupled to the computer-readable storage media, in which, responsive to the execution of the first computer-readable program code, the processor is configured to perform executable operations which include: receiving 3D image data, preferably CT image data in 3D conical beam (CBCT), 3D image data defining a voxel image volume, a voxel being associated with an intensity value or a radiation density value, the voxels defining a 3D representation the dento-maxillofacial structure in the image volume, the dento-maxillofacial structure including a dentition; a trained deep neural network to receive 3D image data at its input and classify at least part of the voxels in the image volume into at least one or more tooth classes, preferably in at least 32 tooth classes of a dentition; and determining a dentition taxonomy that includes defining candidate dentition states, each candidate state being formed by assigning a candidate label to each of the plurality of 3D image data sets based on activation values; and assessing candidate states based on one or more conditions, at least one of the one or more conditions requiring different candidate tooth labels assigned with different 3D image data sets.
[034] [034] In one aspect, the invention may refer to a computer system, preferably a server system, adapted to automatically taxonomize the 3D image data of the teeth comprising: a computer-readable storage medium that has readable program code by computer incorporated in it, the program code including a taxonomy algorithm and trained deep neural networks, the computer readable program code; and a processor, preferably a microprocessor, coupled to the computer-readable storage media, in which, responsive to the execution of the first computer-readable program code, the processor is configured to perform executable operations which include: receiving 3D image data, preferably CT image data in 3D cone beam (CBCT), 3D image data defining a voxel image volume, a voxel being associated with an intensity value or a radiation density value, the voxels defining a 3D representation the dento-maxillofacial structure in the image volume, the dento-maxillofacial structure including a dentition; a first trained deep neural network to receive 3D image data at its input and classify at least part of the voxels in the image volume in at least one or more tooth classes, preferably in at least 32 tooth classes in a dentition; a second trained deep neural network receiving the results of the first trained deep neural network and classifying subsets by individual tooth of the voxel representations received on individual labels for tooth classes; and determining a dentition taxonomy that includes defining candidate dentition states, each candidate state being formed by assigning a candidate label to each of the plurality of 3D image data sets based on activation values; and assessing candidate states based on one or more conditions, at least one of the one or more conditions requiring different candidate tooth class labels assigned with different 3D image data sets.
[035] [035] In one aspect, the invention relates to a client device, preferably a mobile client device, adapted to communicate with a server system, the server system being adapted to automatically taxonomize the 3D image data of the teeth as defined in claims 10 -12, the client device comprising: a computer-readable storage media that has computer-readable program code embedded in it, and a processor, preferably a microprocessor, coupled to the computer-readable storage media and coupled to a display device , in which, responsive to the execution of the first computer-readable program code, the processor is configured to carry out executable operations which include: transmitting 3D image data, preferably CT image data in 3D conical beam (CBCT), the data 3D image defining a voxel image volume, a voxel being associated with an intensity value or a d value the radiation density, the voxels defining a 3D representation of the dento-maxillofacial structure in the image volume, the dento-maxillofacial structure including a dentition; request that the server system segment, classify and taxonomize the 3D image data of the teeth; receiving a plurality of 3D image data sets, each 3D image data set defining a voxel image volume, the voxels defining a 3D tooth model in the image volume; the plurality of 3D image data sets forming the dentition; receiving one or more tooth class labels associated with one or more 3D image data sets; and rendering the one or more sets of 3D image data and the one or more associated tooth class labels on a display.
[036] [036] In one aspect, the invention relates to a computer-implemented method for automated classification of 3D image data of the teeth comprising: a computer receiving one or more sets of image data
[037] [037] In one embodiment, pre-processing additionally includes: determining a part of the longitudinal geometry axis of a 3D tooth model and using the part of the longitudinal geometry axis, preferably a point on the part of the geometry axis, to position the model of 3D tooth in the image volume; and, optionally, determine a center of gravity and / or a high volume part of the 3D tooth model or a slice of it and use the center of gravity and / or a high volume part of the slice of the same to guide the model of 3D tooth in the image volume.
[038] [038] Therefore, the invention can include a computer that includes a deep 3D neural network that classifies at least one 3D image data set representing an individual 3D tooth model by assigning at least one of the tooth labels from of a plurality of candidate tooth labels to the 3D image data set. Before being fed to the input of the 3D image data set, the 3D image data set is pre-processed by the computer in order to provide the 3D tooth model with a standardized orientation in the image volume. In this way, a random orientation in the image volume of the 3D tooth model is defined in a uniform normalized orientation, for example, oriented in the middle of the image volume, a longitudinal geometric axis of the 3D tooth model parallel to the geometric axis z and the crown of the 3D tooth model pointing in the negative z direction and a radial geometric axis through a center of gravity of the 3D tooth model pointing in the positive x direction. Preprocessing based on tooth morphology addresses the problem that deep 3D neural networks are sensitive to rotational variations of a 3D tooth model.
[039] [039] In one embodiment, the computer can receive a plurality of 3D image data sets that are part of a dentition. In this case, the method may further comprise: determining a dentition taxonomy that includes: defining candidate dentition states, each candidate dentition state being formed by assigning a candidate tooth label to each of the plurality of 3D image data sets based on activation values; and evaluate candidate dentition states based on one or more conditions, at least one of the one or more conditions requiring different candidate tooth labels to be assigned to different sets of 3D image data, preferably the order in which dentition states Candidates are evaluated based on the height of the activation values associated with a candidate dentition status.
[040] [040] In a further aspect, the invention may refer to a computer-implemented method for automated taxonomy of 3D image data of teeth which comprises: a computer receiving a plurality of 3D image data sets, a data set of 3D image defining a voxel image volume, voxels defining a 3D tooth model in the image volume, the image volume being associated with a 3D coordinate system, the plurality of 3D image data sets being part of a dentition ; the computer provides each of the 3D imaging data sets for the input of a trained deep neural network and the trained deep neural network classifies each of the 3D image data sets based on a plurality of candidate dentition tooth labels, where the classification of a 3D image data set includes generating, for each of the candidate tooth labels, an activation value, an activation value being associated with a candidate label that defines the probability that the data set of 3D image represents a type of tooth indicated by the candidate tooth label; and the computer determines a dentition taxonomy that includes: defining candidate dentition states, each candidate state being formed by assigning a candidate tooth label to each of the plurality of 3D image data sets based on activation values; and assessing candidate dentition states based on one or more conditions, at least one of the one or more conditions requiring different candidate tooth labels assigned with different 3D image data sets.
[041] [041] Therefore, the invention can additionally provide a very accurate method of providing a fully automated taxonomy of 3D image data sets that form a dentition using a 3D-trained deep neural network and a post-processing method. During post-processing, the classification results of the plurality of 3D image data sets that form a dentition, that is, the candidate tooth labels and associated activation values for each 3D image set, can be evaluated based on one or more conditions in order to provide a precise dentition taxonomy.
[042] [042] In one embodiment, determining a dentition taxonomy may additionally include: defining candidate dentition states, each candidate dentition state being formed by assigning a candidate tooth label to each of the plurality of image data sets 3D based on activation values; and assessing candidate dentition states based on one or more conditions, at least one of the one or more conditions requiring different candidate tooth labels to be assigned to different sets of 3D image data.
[043] [043] In a still further aspect, the invention relates to a computer-implemented method for automated segmentation and classification of 3D image data of teeth comprising: a computer receiving 3D image data, preferably beam CT image data conical 3D (CBCT), 3D image data defining an image volume of the voxels, a voxel being associated with an intensity value or a radiation density value, the voxels defining a 3D representation of the dento-maxillofacial structure in image volume, the dento-maxillofacial structure including a dentition; a first trained deep neural network to receive 3D image data at its input and classify at least part of the voxels in the volume of the image into at least one of the voxels of the jaw, teeth and / or nerve; segment the tooth voxels classified into a plurality of 3D image data sets, each 3D image data set defining a voxel image volume, the voxels defining a 3D tooth model in the image volume; the computer provides each of the 3D image data sets for the input of a second trained deep neural network and the second trained deep neural network classifies each of the 3D image data sets based on a plurality of candidate tooth labels from dentition, in which the classification of a 3D image data set includes: generating, for each of the candidate tooth labels, an activation value, an activation value associated with a candidate label defining the probability that the data set 3D image represents a tooth type indicated by the candidate tooth label.
[044] [044] The invention can also provide a method of fully automated segmentation and classification of 3D image data, for example, a 3D (CB) CT image data set, which includes a dento-maxillofacial structure that includes a dentition , where 3D image data sets, each 3D image data set forming a 3D tooth model, are generated using a first trained deep neural network and where 3D image data sets are classified by label assignment to each of the 3D image data sets.
[045] [045] In one embodiment, segmentation may include: a pre-processing algorithm that uses voxels to determine one or more 3D positional features of the dento-maxillofacial structure, the one or more 3D positional features being configured for entry into the first deep neural network, a 3D positional resource defining the position information of the voxels in the image volume, the first deep neural network receiving 3D image data and the one or more positional resources determined at its input and using the one or more resources positional to classify at least part of the voxels in the image volume in at least one of the voxels of the jaw, teeth and / or nerve.
[046] [046] In the modality, the position information can define a distance, preferably a perpendicular distance, between the voxels in the image volume and a first dental reference plane in the image volume; a distance between voxels in the image volume and a first dental reference object in the image volume; and / or positions of intensity values accumulated in a second reference plane of the image volume, where an intensity value accumulated at a point in the second reference plane includes accumulated intensity values of voxels at or in the vicinity of normal execution through of the point on the reference plane.
[047] [047] In one embodiment, the method may comprise: determining a dentition taxonomy that includes: defining candidate dentition states, each candidate state being formed by assigning a candidate tooth label to each of the plurality of image data sets 3D based on activation values; and assessing candidate dentition states based on one or more conditions, at least one of the one or more conditions requiring different candidate tooth labels assigned with different 3D image data sets.
[048] [048] In an additional aspect, the invention can refer to the computer system, preferably a server system, adapted to automatically classify the 3D image data of the teeth it comprises: a computer-readable storage medium that has readable program code by computer incorporated in it, the program code including a pre-processing algorithm and a trained deep neural network, the computer-readable program code; and a processor, preferably a microprocessor, coupled to the computer-readable storage media, where, responsive to the execution of the first computer-readable program code, the processor is configured to perform the executable operations that comprise: receiving one or more of the sets 3D image data, a set of 3D image data defining an image volume of the voxels, the voxels defining a 3D tooth model in the image volume, the image volume being associated with a 3D coordinate system; preprocessing each of the 3D image data sets, preprocessing including: positioning and orienting each of the 3D tooth models in the image volume based on the tooth morphology, preferably the 3D shape of a tooth and / or a slice of the 3D shape; provide each of the preprocessed 3D image data sets for the input of a trained deep neural network and the trained deep neural network to classify each of the preprocessed 3D image data sets based on a plurality of tooth labels dentition candidates, where the classification of a 3D image data set includes generating, for each of the candidate tooth labels, an activation value, an activation value associated with a candidate tooth label defining the probability that the 3D image data set represents a tooth type indicated by the candidate tooth label.
[049] [049] In a still further aspect, the invention can refer to a computer system, preferably a server system, adapted to automatically taxonomize the 3D image data of the teeth which comprises: a computer-readable storage media that has code computer-readable program incorporated therein, the program code including a taxonomy algorithm and a trained deep neural network, the computer-readable program code; and a processor, preferably a microprocessor, coupled to the computer-readable storage media, where, responsive to the execution of the first computer-readable program code, the processor is configured to perform the executable operations that comprise: receiving a plurality of sets of 3D image data, a set of 3D image data defining an image volume of the voxels, the voxels defining a 3D tooth model in the image volume, the image volume being associated with a 3D coordinate system, the plurality of sets 3D image data forming a dentition; provide each of the 3D imaging data sets for the input of a trained deep neural network and the trained deep neural network to classify each of the 3D image data sets based on a plurality of candidate dentition tooth labels, in which the classification of a 3D image data set includes generating, for each of the candidate tooth labels, an activation value, an activation value associated with a candidate label defining the probability that the 3D image data set represents a tooth type indicated by the candidate tooth type label; and determining a dentition taxonomy that includes defining candidate dentition states, each candidate state being formed by assigning a candidate label to each of the plurality of 3D image data sets based on activation values; and assessing candidate states based on one or more conditions, at least one of the one or more conditions requiring different candidate tooth labels assigned with different 3D image data sets.
[050] [050] In one aspect, the invention may refer to a computer system, preferably a server system, adapted to automatically segment and classify the 3D image data of the teeth comprising: a computer-readable storage medium that has code computer-readable program incorporated therein, the program code including a segmentation algorithm and a first and a second deep neural network, the computer-readable program code; and a processor, preferably a microprocessor, coupled to the computer-readable storage media, in which, responsive to the execution of the first computer-readable program code, the processor is configured to perform executable operations which include: receiving 3D image data, preferably CT image data in 3D conical beam (CBCT), 3D image data defining a voxel image volume, a voxel being associated with an intensity value or a radiation density value, the voxels defining a 3D representation the dento-maxillofacial structure in the image volume, the dento-maxillofacial structure including a dentition; a first trained deep neural network to receive 3D image data at its input and classify at least part of the voxels in the image volume into at least one of the voxels of the jaw, teeth and / or nerve; segment the voxels of teeth classified into a plurality of sets of 3D image data, each set of 3D image data defining a volume of the voxel image, the voxels defining a 3D tooth model in the image volume; provide each of the 3D imaging data sets for the input of a second trained deep neural network and the second trained deep neural network to classify each of the preprocessed 3D image data sets based on a plurality of candidate tooth labels dentition, in which the classification of a 3D image data set includes generating, for each of the candidate tooth labels, an activation value, an activation value associated with a candidate label defining the probability that the data set 3D image represents a tooth type indicated by the candidate tooth type label.
[051] [051] In a further aspect, the invention may refer to a client device, preferably a mobile client device, adapted to communicate with a server system, the server system being adapted to automatically taxonomize the 3D image data of the teeth as exposed, the client device comprising: a computer-readable storage media that has computer-readable program code embedded in it, and a processor, preferably a microprocessor, coupled to the computer-readable storage media and coupled to a display device, where , responsive to the execution of the first computer-readable program code, the processor is configured to perform the executable operations which include: transmitting one or more of the first 3D image data sets to the server system, a 3D image data set defining a volume of the voxel image, the voxels defining a 3D tooth model in the image volume, the vo image light being associated with a 3D coordinate system; request the server system to taxonomize the 3D image data of the teeth; receive one or more of the second sets of 3D image data from the server system, the one or more of the second sets of 3D image data being generated by the server system based on one or more first sets of 3D image data, the generation including processing each of the 3D image data sets, processing including positioning and orienting each of the 3D tooth models in the image volume based on the morphology of the teeth, preferably the 3D shape of one of the teeth and / or a slice the 3D shape; receiving one or more tooth labels associated with the one or more second sets of 3D image data, respectively; and rendering the one or more second sets of 3D image data and the one or more associated tooth labels on a display.
[052] [052] In one aspect, the invention may refer to a client device, preferably a mobile client device, adapted to communicate with a server system, the server system being adapted to automatically segment and classify 3D image data of teeth as defined in claim 13, the client apparatus comprising: a computer-readable storage media that has computer-readable program code incorporated therein, and a processor, preferably a microprocessor, coupled to the computer-readable storage media and coupled to a device display, in which, responsive to the execution of the first computer-readable program code, the processor is configured to perform executable operations that comprise: 3D image data, preferably 3D conical beam CT (CBCT) data, data 3D image defining a voxel image volume, a voxel being associated with an intensity value or a value the radiation density, the voxels defining a 3D representation of the dento-manxilofacial structure in the image volume, the dento-maxillofacial structure including a dentition; request the server system to segment and classify the 3D image data; receiving a plurality of 3D image data sets, each 3D image data set defining a voxel image volume, the voxels defining a 3D tooth model in the image volume; the plurality of 3D image data sets forming the dentition; receiving one or more tooth labels associated with one or more sets of 3D image data; and rendering the one or more sets of 3D image data and the one or more associated tooth labels on a display.
[053] [053] The invention can also refer to a computer program product that comprises pieces of software code configured to, when executed in a computer's memory, execute any of the exposed methods.
[054] [054] The invention will be further illustrated in relation to the attached drawings, which will schematically show the modalities according to the invention. It will be understood that the invention is in no way restricted to these specific modalities. BRIEF DESCRIPTION OF THE DRAWINGS
[055] [055] Figure 1 represents a high-level schematic representation of the computer system that is configured to automatically taxonomize the teeth of a dentition according to one embodiment of the invention; figure 2 represents a training flow chart of a deep neural network to classify individual teeth according to an embodiment of the invention;
[056] [056] In this disclosure, computer system modalities and computer-implemented methods that use deep neural networks to classify 3D image data representing teeth are described. The 3D image data can comprise voxels that form a dento-maxillofacial structure that comprises a dentition. For example, 3D image data can include JCT 3D image (CB) data (generated by a CT scanner). Alternatively, the 3D image data may comprise a superficial mesh of the teeth (for example, generated by an optical 3D scanner). A computer system according to the invention can comprise at least one deep neural network that is trained to classify a set of 3D image data that defines a volume of the voxel image, where the voxels represent the 3D tooth structures in the volume of the image and where the image volume is associated with a 3D coordinate system. The computer system can be configured to perform a training process that iteratively trains (optimizes) one or more deep neural networks based on one or more training sets that can include 3D representations of dental structures. The shape of a 3D representation of an individual tooth can be optimized for entry into a deep 3D neural network. Optimization can include preprocessing 3D image data, where preprocessing can include determining 3D positional features. A 3D positional feature can be determined by aggregating the information for the original 3D image data received, as it can be beneficial for accurate classification, and adding such a feature to the 3D image data as a separate channel.
[057] [057] Once trained, the first deep neural network can receive 3D image data from a dentition and classify the voxels of 3D image data. The output of the neural network may include different collections of voxel data, where each collection may represent a distinct part (for example individual teeth, individual nerves, sections of jaw bone) of the 3D image data. Classified voxels for individual teeth can be post-processed to reconstruct an accurate 3D representation of each classified volume.
[058] [058] The classified voxels or the reconstructed volume per individual tooth can be additionally post-processed to normalize the orientation, dimensioning and position in a specific 3D bounding box, if applicable. This reconstructed (normalized) voxel set that optionally contains the shape of an individual tooth, together with its associated subset of the original 3D image data received (if applicable, normalized in the same way), can be presented for the entry of a second 3D deep neural network that is trained to determine the activation values associated with a set of candidate tooth labels. The second deep neural network in 3D can receive 3D image data representing (part of) an individual tooth at its entrance, and generate at its output a single set of activations for each of the candidate tooth labels.
[059] [059] In this way, two sets of classification results per individual tooth object can be identified, a first set of classification results classifying voxels in different voxel classes (for example, individual tooth classes, or 32 possible types of generated by the first 3D deep neural network and a second set of classification results by classifying a voxel representation of an individual tooth into different tooth classes (for example, again, 32 possible types of tooth, or a different classification, such as as incisor, canine, molar, etc.) generated by the second deep neural network in 3D. The plurality of tooth objects that form (part of) a dentition can finally be post-processed in order to determine the most accurate taxonomy possible, making use of the predictions resulting from the first and, optionally, the second neural networks, which are both , adapted to classify the 3D data of the individual teeth.
[060] [060] The computer system comprising at least one neural network trained to automatically classify a set of 3D image data that forms a dentition, network training, pre-processing of the 3D image data before they are fed in the neural network, as well as the post-processing of the results determined by the first neural network are described in more detail below.
[061] [061] Figure 1 represents a high-level schematic representation of a computer system that is configured to automatically taxonomize teeth in 3D image data according to an embodiment of the invention. The computer system 100 may comprise a processor for preprocessing the input data 102, the 3D image data associated with a dentition, in a 3D representation of the teeth. The processor can derive the 3D 104 representation of the teeth from the 3D image data of real-world maxillofacial structures (which include teeth and may include spatial information), in which 3D image data can be generated using techniques known, such as a CBCT scanner or optical scans of complete tooth shapes. The 3D representation of the teeth may have a 3D data format that is most beneficial as input data for the 3D deep neural network processor 106, which is trained for the classification of the teeth. The 3D data format can be selected in such a way that the accuracy of a set of classified teeth 110 (the output of computer system 100) is optimized. The conversion of the input data into a 3D representation can be referred to as pre-processing of the input data. The computer system may also include a processor 108 for postprocessing the output of the neural network in the 3D processor. The post-processor can include an algorithm to correct voxels that are incorrectly classified by the first deep neural network. The post-processor may additionally include an algorithm for further classification of a 3D image data set representing an individual tooth. The post-processor can additionally make use of a rules-based system that makes use of knowledge that considers the dentitions at the top of the outlet of a deep neural network. Computer systems and their processor will be described in more detail below in relation to the figures.
[062] [062] Figure 2 represents a training flow chart of a deep neural network to classify individual teeth according to a modality of the invention. In order to train the 3D deep neural network to classify a 3D representation of an individual tooth, different data sources can be used.
[063] [063] As shown in this figure, several sources 206, 212 of 3D image data 214 can be selected to train the deep neural network in 3D. These data sources may require 216 preprocessing. A 3D data source may include CT 3D image data 206, in particular, CT 3D image (CB) data that represent a dento-maxillofacial structure and include a dentition . Often, 3D image data represents a voxel representation of a dento-maxillofacial structure that includes part of the jaw bone and teeth. In this case, the system may additionally comprise a computer system for automatically segmenting individual teeth 208 into 3D CT image data. A system like this can produce volume of interests (VOI) 210, in which each VOI can comprise a volume of voxels selected from the voxels that form the complete CT (CB) scan. The volume of voxels selected can include voxels that represent a tooth, including the crown and roots. The computer system for automatic segmentation may include a processor of the 3D deep neural network that is trained to segment teeth in 3D image data that represent a dento-maxillofacial structure. The details of the computer system for automatic segmentation of a voxel representation of a dento-maxillofacial structure are described in more detail below in relation to figures 7-17.
[064] [064] An additional source of 3D image data for an individual tooth can be 3D image data for a complete tooth, that is, both crown and roots, generated by an 212 optical scanner. A scanner like this can generate a 3D representation of the teeth in the form of a 3D surface mesh 214. Optionally, system 208 can be configured to produce a surface mesh based on a segmented tooth.
[065] [065] The deep neural network that will be trained to classify individual teeth into their correctly labeled classes may require a 3D data set representing an individual tooth to be converted into a 3D data format that is optimized for a deep neural network in 3D. Such an optimized 3D data set increases the accuracy of the classification since the 3D deep neural network is sensitive to intraclass variations between samples, especially variations in the orientation of the 3D tooth model. To this end, a pre-processing step
[066] [066] For each voxel representation of an individual tooth 218, a correct label 220, that is, a label representing the tooth number (correct class or index number) of the voxel representation of the tooth, is required to train the 3D deep learning network 222 to correctly identify the desired labels. In this way, the 3D deep neural network is trained to automatically classify the voxel representations of the teeth. Due to the symmetrical nature of a dentition, samples can be mirrored to expand the number of samples that will be provided for training. Similarly, samples can be increased by adding slightly modified versions that, in 3D space, have been arbitrarily rotated or stretched to doable limits.
[067] [067] Figure 3 represents a computer system for automated taxonomy of the 3D tooth model according to an embodiment of the invention. The computer system can include two different modules, a first training module 328 to perform a process to train the deep neural network in 3D 314 and a second classification module to perform a classification process based on new input data. As shown in figure 3, the training module can comprise one or more repositories or databases 306, 310 of data sources intended for training. Such a repository can be originated through an entry 304 which is configured to receive input data, for example, 3D image data that includes dentitions, which can be stored in various formats, together with the respective desired labels. At least one first repository or database 306 can be used to store CT 3D image (CB) data of dentitions and associated labels. This database can be used by a computer system 307 to segment and extract volumes of interest 308 that represent a volume of voxels comprising voxels from an individual tooth that can be used for training. In one embodiment, the computer system 307 can be configured to segment volumes of interest by individual tooth class, that is, producing both a volume of interest and a target label. Similarly, a second repository or database 310 can be used to store other 3D data formats, for example, 3D surface meshes generated by optical scanning, and individual tooth labels that can be used during network training.
[068] [068] 3D training data can be preprocessed 312 in a 3D voxel representation that is optimized for the deep neural network 314. The training process can end at this stage, as the processor of the 3D deep neural network 314 may only require training on individual tooth samples. In one embodiment, 3D tooth data, such as a 3D surface mesh, can also be determined based on segmented 3D image data that originates from CT (CB) scans.
[069] [069] When using classification module 330 to classify a new dentition 316, again, multiple data formats can be used during the translation of the physical dentition into a 3D representation that is optimized for the deep neural network 314. The system can make use of 3D CT (CB) image data of dentition 318 and use a computer system 319 that is configured to segment and extract volumes of interest comprising individual tooth voxels 320. Alternatively, another representation, such as a surface mesh per tooth 322 resulting from optical scans can be used. Note, again, that the (CB) CT data can be used to extract different 3D representations of volumes of interest.
[070] [070] Preprocessing 312 for the format required for the deep neural network 314 can be started. The outputs of the deep neural network can be fed in a post-processing step 324 designed to make use of the knowledge that considers the dentitions to guarantee the accuracy of the taxonomy through the set of labels applied to the teeth of the dentition. In one embodiment, the correct labels can be fed back into the training data in order to increase future accuracy after additional training of the deep neural network. The presentation of results to an end user can be facilitated by a rendering engine that is adapted to render a 3D and / or 2D representation of 3D tooth data automatically classified and taxonomized. Examples of rendered and rated 3D rendered tooth data are described in relation to figures 20A and 20B.
[071] [071] Figures 4A and 4B represent schematic representations illustrating the normalization of individual tooth data according to various modalities of the invention. In particular, Figure 4A represents a flowchart of processing the 3D meshes that represent the surface of an individual tooth that can be derived from a dentition or from other sources. The purpose of the pre-processing step is to create a 3D voxel representation of the data that is optimized for interpretation by the 3D deep neural network processor. As shown in figure 4A, the process can include a step of interpolating the meshes of the 3D surface 402 (segmented from a dentition or from another source) into a 3D 404 voxel representation. In a step like this, 3D surface meshes can be represented as a 3D voxel volume that has a predetermined initial voxel value, for example, a “zero” or “origin” value where no tooth surface is present, and a “ one ”or“ present tooth ”for those voxels that match or almost match the 3D surface defined by the meshes. The 3D voxel representation thus formed thus includes a volume, for example, a volume, for example, a rectangular box, of voxels in which the 3D surface of a tooth is represented by voxels in the volume which has a second voxel value and the rest of the voxels have a first voxel value. In one embodiment, the method can also include the step of defining the voxels confined by the surface mesh to the second voxel value, so that the 3D voxel representation represents a solid object in 3D space.
[072] [072] In one embodiment, a voxel representation (which can be determined by segmenting an individual tooth, for example, from a CT scan (CB) of a dentition) can also be processed based on process steps 404 and additionally.
[073] [073] The (rectangular) volume of voxels can be associated with a coordinate system, for example, a 3D Cartesian coordinate system so that the 3D voxel representation of a tooth can be associated with an orientation and a dimension. The orientations and / or dimensions of the tooth models, however, may not be standardized. The 3D deep neural network is sensitive to tooth orientation and may have difficulties in classifying a tooth model that has a random orientation and non-standard dimensions in the volume of the 3D image.
[074] [074] In order to address this problem, during pre-processing, the orientation and dimensions of separate tooth models (3D voxel representations) can be normalized. What this means is that each of the 3D voxel data samples (a 3D voxel data sample representing a tooth generated in steps 404 and / or 406) can be transformed in such a way that the dimensions and orientation of the samples uniform (step 410). The preprocessor can achieve such orientation and / or normalized dimensions using spatial information from the dentition source.
[075] [075] Spatial information can be determined by the preprocessor by examining the dimensions and orientation of each sample at the dentition source (step 408). For example, when dentition tooth samples originate from a single CT 3D data stack (CB) that defines a 3D image volume, the dimensions and orientation of each tooth sample can be determined by the system. Alternatively, spatial information can be provided with individual 3D voxel representations.
[076] [076] The preprocessor can examine the orientation and dimensions derived from the original CT 3D data stack (CB) and, if these values do not correspond with the desired input format for the deep learning network, a transformation can be applied . Such a transformation can include a 3D rotation in order to reorient the orientation of a sample in 3D space (step 410) and / or a 3D scaling in order to re-scale the dimensions of a sample in 3D space (step 412).
[077] [077] Figure 4B represents a method of normalizing the orientation and / or dimensions of tooth data according to an embodiment of the invention. In the event that the original 3D image data of teeth in a dentition does not have intra-sample consistency of dimensions and / or orientation; and / or if dimensions and / or orientation are unknown, several methods can be used to achieve a standardized 3D voxel representation for all samples that form the dentition.
[078] [078] This normalization process can use one or more transformations that are based on the morphology of a tooth: for example, based on the structure of the tooth, a longitudinal geometric axis can be determined and, due to the non-symmetrical shape of a tooth, in addition, a position of a center of gravity of the tooth structure can be determined, which - due to the non-symmetrical shape of the tooth - can be positioned at a distance from the longitudinal geometric axis. Based on this information, a normal orientation of a tooth can be determined in a 3D image space where the top side, the bottom side, the back side and the front side of a tooth can be uniformly defined. Such determination, for example, of longitudinal geometric axes can be carried out through component analysis of principle, or by another means, as described below.
[079] [079] As shown in figure 4B, the orientation and dimensions of a 3D tooth sample in a 3D image space can be based on a predetermined coordinate system. The geometric axes x, y and z can be chosen as indicated, however, other choices are also possible. When considering a completely arbitrary orientation of a 3D tooth sample 422, the rotations along the two geometric axes (x and y in this example) can be defined by determining two points 424 and 426 in the sample that has the greatest distance between them. . The line between these two points can define (part of) a longitudinal geometric axis of the tooth structure. The sample can be translated so that a predetermined point on the part of the longitudinal geometric axis, for example, the midpoint, between the two points 424 and 426 can coincide with the center of the image space. In addition, the sample can be rotated along the central point in such a way that the part of the longitudinal geometric axis is parallel with the z geometric axis, resulting in a reorientation of the sample (as shown in 428). Therefore, this transformation defines a longitudinal geometric axis based on the shape of the tooth, uses a point (for example, the middle) on the longitudinal geometric axis to position the tooth in the volume of the 3D image (for example, in the center of the volume) and aligns the longitudinal geometric axis to a geometric axis, for example, the z geometric axis, of the 3D image volume coordinate system.
[080] [080] Additionally, a 431 center of gravity of the dental structure can be determined. In addition, a 430 plane - in this case an xy plane, normal to the longitudinal geometric axis of the tooth structure and positioned at the center of the longitudinal geometric axis - can be used to determine whether most of the sample volume and / or the center gravity is above or below the plane. A rotation can be used to ensure that most of the volume is on a selected side of the xy 430 plane, in the case of this example, the sample is rotated in such a way that the largest volume is down in the direction of the negative z direction, resulting in in a transformation shown in 432. Therefore, this transformation uses the tooth volume below and above a normal plane in relation to the longitudinal geometric axis of the tooth structure and / or the position of the center of gravity positioned in relation to that plane in order to to determine an upper and lower side of the tooth structure and to align the tooth structure with the geometric axis in this way. For any identical sample received in an arbitrary orientation, there will be only one aspect of the orientation that may differ after these transformation step (s), which is the rotation along the z-axis, as indicated by 434.
[081] [081] There are different ways to define this rotation. In one embodiment, a plane can be used that is rotated along the center point and the geometric axis z. The system can find the rotation of the plane in which the volume on one side of this plane is maximized. The determined rotation can then be used to rotate the sample in such a way that the maximum volume is oriented in a selected direction along a selected geometric axis. For example, as shown in 446, the volume quantity in the positive x direction is maximized, effectively setting the plane found to 436 parallel to a predetermined one, for example, the z-y plane, as shown in 448.
[082] [082] In an additional mode, instead of volumes, the center of gravity can be used to define the rotation. For example, the system can build a radial part of the geometry axis that runs through the center of gravity and a point on the longitudinal geometry axis. Subsequently, a rotation along the longitudinal geometric axis can be selected by the system, in such a way that the radial part of the geometric axis is oriented in a predetermined direction, for example, the positive x direction.
[083] [083] In yet another embodiment, the 3D tooth structure can be sliced at a predetermined point on the longitudinal geometric axis of the tooth structure. For example, in 438, the tooth structure can be sliced at a point on the longitudinal geometric axis that is a predetermined distance from the base side of the tooth structure. In this way, a 2D data slice can be determined. In this 2D slice, the two points with the greatest distance from each other can be determined. The line between these points can be referred to as the lateral geometric axis of the tooth structure. The sample can then be rotated in such a way that the lateral geometric axis 440 is parallel to a predetermined geometric axis (for example, the geometric axis y). This can leave two possible rotations along the longitudinal geometric axis 434 (since there are two possibilities for the line 440 to be parallel to the geometric axis y).
[084] [084] The selection between these two rotations can be determined based on the two areas defined by the slice and the lateral geometric axis. Subsequently, the structure can be rotated along the longitudinal geometric axis, in such a way that the largest area is oriented in the direction of a predetermined direction, for example, as shown in 442, in the direction of the side of the geometric x negative axis
[085] [085] Finally, the 3D deep learning network expects each sample to have the same quantities and voxel resolution in each dimension. For this purpose, pre-processing can include a step 412 of determining a volume that each potential sample will adapt to and locating each sample centralized in this space. It is understood that, depending on the format of the data source, one or multiple of the steps in figure 4 can be omitted. As an example, when working with volumes of interest (VOls) from a CT 3D data stack (CB), steps 402 to 406 can be omitted.
[086] [086] Figure 5 represents an example of a 3D deep neural network architecture for the classification of individual teeth for use in the methods and systems for automated taxonomy of 3D image data as described in this application. The network can be implemented using 3D convolutional layers (3D CNNs). Convolutions can use an activation function known in the field. A plurality of 3D convolutional layers, 504—508, can be used, where minor variations in the number of layers and their definition parameters, for example, different activation functions, kernel quantities, use of subsampling and scaling, and functional layers Additional features, such as abandonment layers and batch normalization, can be used in the implementation without losing the essence of the deep neural network design.
[087] [087] In order to reduce the dimensionality of the internal representation of the data in the deep neural network, a layer of maximum conjugation in 3D 510 can be used. At this point in the network, the internal representation can be passed to a densely connected layer 512 aimed at being an intermediary for translating the representation in 3D space for potential label activations, in particular, tooth type labels. The final or output layer 514 can have the same dimensionality as the desired number of encoded labels and can be used to determine an activation value (analogous to a forecast) per potential label 518.
[088] [088] The network can be trained based on preprocessed 3D image data 502 (for example, 3D voxel representations of individual teeth, as described in relation to figure 4). In one embodiment, the 3D image data may comprise a plurality of image channels, for example, a fourth dimension comprising additional information. The 3D image data of the individual channel can comprise one data point per x, y, and z location of a voxel (for example, density values in the case of CT scans (CB) or a binary value (“zeros” / “ones ”) In the case of binary representation of voxel, as described in relation to the process in figure 4A). In contrast, multichannel 3D image data can include two or more different data points per voxel (comparable, for example, to color images, which usually comprise three channels of information, one for red, one for green and one for blue) . Therefore, in one embodiment, a deep 3D neural network can be trained to process multichannel 3D image data.
[089] [089] In one embodiment, such multi-channel 3D image data may, for example, comprise a first channel comprising the original 3D CT (CB) image data of an individual tooth, and a second channel containing the processed version of the same tooth. tooth that can be produced using a method described in relation to figure 4A. The offer of both these sets can produce information that considers both the exact segmented shape (in 3D, binary represented), as well as the information from the original image (density values) that may be relevant to the classification problem. The offer of both increases the potential for accurate classification.
[090] [090] For each sample (which is a 3D representation of an individual tooth), a corresponding representation of the correct label 516 can be used to determine a loss between desired and actual output 514. This loss can be used during training as a measure to adjust the parameters in the layers of the deep neural network. Optimizer functions can be used during training to assist in the efficiency of training effort. The network can be trained for any number of iterations until the internal parameters lead to a desired precision of the results. When properly trained, an unlabeled sample can be presented as an input and the deep neural network can be used to derive a forecast for each potential label.
[091] [091] Therefore, as the deep neural network is trained to classify a 3D data sample of a tooth into one of a plurality of tooth types, for example, 32 tooth types in the case of an adult dentition, the exit of the neural network will comprise activation values and associated potential tooth type labels. The label of the potential tooth type with the highest activation value may indicate to the system that the 3D data sample of a tooth is more likely to represent a tooth of the type indicated by the label. The label of the potential tooth type with the lowest or a relatively low activation value may indicate to the system that the 3D dataset for a tooth is less likely to represent a tooth of the type indicated by a label like this.
[092] [092] Figure 6 represents a post-processing flow chart according to an embodiment of the invention. In order to make use of the information available when considering a source set of individual tooth objects for a single 602 dentition, this post-processing can be used to determine the most feasible label assignment per tooth. Each 3D data set representing a 606 tooth can be processed by the deep neural network to obtain a most likely prediction value per possible candidate 608 label. There may be multiple predictions per tooth object (or individual tooth 3D data set) following , for example, the classification of tooth objects by multiple methods.
[093] [093] Candidate dentition states (or, in short, candidate states) can be generated, in which each 3D dataset of a tooth is assigned to a candidate tooth label. An initial candidate state can be created 610 by assigning a candidate tooth label to a 3D dataset of a tooth that has the highest activation value for this candidate tooth label. A candidate state (dentition) in this context can refer to a single assignment of a tooth label for each tooth object (represented, for example, by a 3D image data set) that forms the dentition. This initial state may not be the desired final state, as it may not satisfy the necessary conditions that are satisfied for a final resolved dentition state. The size of a state, for example, the number of teeth present in a dentition, can vary from dentition to dentition.
[094] [094] A priority value can be assigned to each candidate state, which can be used to determine an order in which candidate states can be assessed. Priority values can be set using the desired objectives to optimize an ideal resolved solution. In one embodiment, a priority value for a candidate state can be determined based on the activation values, for example, the sum of the activation values (which can be multiple per tooth object), which are assigned to the candidate state labels candidate. Alternatively and / or moreover, in a modality, a priority value can be determined based on the number of candidate labels exclusively assigned and / or the number of duplicate label assignments.
[095] [095] The grouping of candidate 612 dentition states and priority values can be stored in a computer system memory (where each candidate state can include candidate tooth labels and associated priority values).
[096] [096] Candidate teething states 614 can be selected in the order of the assigned priority values and evaluated in an iterative process in which the computer can verify that predetermined conditions are met (as shown in step 616). The conditions can be based on the knowledge of a dentition. For example, in one embodiment, a condition may be that a candidate label for a tooth may occur only once (exclusively) in a single candidate dentition state. Additionally, in some embodiments, the information associated with the position of the COG for each set of 3D tooth data can be used to define one or more conditions. For example, when using the FDI adult tooth numbering system, tooth tags with 1x and 2x index (x = 1, ..., 8) can be part of the upper jaw and tooth tags 3x and 4x (x = 1, ..., 8) can be part of the lower jaw. Here, the indices 1X, 2x, 3x, 4x (x = 1, ..., 8) define four quadrants and the number of teeth x in them. These tooth labels can be verified based on COGs that are associated with each 3D representation of a tooth. In additional modalities, the plurality of teeth labels can be considered as an orderly arrangement of teeth of different types of teeth in their jaw, producing additional conditions that consider the appropriate assignment of labels in a dentition in relation to each COoG.
[097] [097] As another example, in one embodiment, label activations gathered from (one of) the deep neural network (s) can be limited to a class of tooth type in the form of “ incisive ”,“ canine ”,“ molar ”. With a state being able to facilitate such classifications and being able to verify feasible conditions (for example, two incisors per quadrant), the method described may be able to efficiently assess any condition to be satisfied.
[098] [098] The (order of) assessment of candidate states can be based on the priority values assigned by the neural network. In particular, the resolved candidate states are optimized based on the priority values. For example, when the priority values are derived from the assigned activation values of one or more deep neural networks, the final solution presented by the 620 system (ie, the output) will be the (first) candidate dentition state that satisfies conditions while maximizing activation values assigned (ie, the sum of the activation values is maximum).
[099] [099] When, during the evaluation of a candidate dentition state, one or more conditions are not met, new candidate state (s) 618 can be generated. Considering the enormous space of possible states, it would not be feasible to generate and consider all possible candidate states. Therefore, new candidate state (s) can be generated based on candidate tooth labels that did not match conditions 616. For example, in a modality, if a subset of representations 3D tooth map of a candidate dentition state to include two or more of the same tooth labels (and thus conflict with the condition that a dentition state must contain a set of uniquely assigned tooth labels), can (m) new candidate state (s) being attempted to resolve this particular exception will be generated. Similarly, in one embodiment, if the 3D tooth representations of a candidate dentition state contain conflicting COGs, a new candidate state (s) trying to resolve can be generated. this exception in particular. These new candidate state (s) can be generated step by step, based on the original conflicting state, while maximizing their expected priority value. For example, in order to determine a next candidate state, for each label that has an exception, the tooth (s) representation (s) (original) assigned to the particular label in the state that has a (s) exception (s) can be switched to the representation that produces the next highest expected priority.
[0100] [0100] In the above way, in some modalities, the image data
[0101] [0101] Once trained, the deep neural network can receive a 3D image data stack from a dento-maxillofacial structure and classify the voxels of the 3D image data stack. Before the data is presented to the trained deep neural network, the data can be pre-processed so that the neural network can classify voxels efficiently and precisely. The output of the neural network may include different collections of voxel data, where each collection may represent a distinct part, for example, teeth or jaw bone of 3D image data. The classified voxels can be post-processed in order to reconstruct a precise 3D model of the dento-maxillofacial structure.
[0102] [0102] The computer system comprising a neural network trained to automatically classify the voxels of the dento-maxillofacial structures, the network training, the pre-processing of the 3D image data before they are fed into the neural network, as well as the post-processing of voxels that are classified by the neural network are described in more detail below.
[0103] [0103] Figure 7 schematically represents a computer system for classification and segmentation of the 3D maxillofacial structures according to an embodiment of the invention. In particular, computer system 702 can be configured to receive a stack of 3D image data 704 from a dento-maxillofacial structure. The structure may include individual jaw, individual tooth and individual nerve structures. The 3D image data may comprise voxels, that is, elements of 3D space associated with a voxel value, for example, a gray scale value or a color value, which represents a radiation intensity or a density value. Preferably, the 3D image data stack may include CBCT image data according to a predetermined format, for example, the DICOM format or a derivative thereof.
[0104] [0104] The computer system may comprise a 706 preprocessor for preprocessing 3D image data before it is fed into the input of a first 3D deep learning neural network 712, which is trained to produce a 3D set of voxels classified as an output 714. As will be described in more detail below, the 3D deep learning neural network can be trained according to a predetermined training scheme, so that the trained neural network is able to precisely classify the voxels in the 3D image data stack into voxels of different classes (for example, voxels associated with individual tooth, jaw bone and / or nerve tissue). Preferably, the classes associated with the individual teeth consist of all the teeth that may be present in the healthy dentition of an adult, with 32 classes of individual teeth. The 3D deep learning neural network can comprise a plurality of layers of the connected 3D convolutional neural network (CNN 3D).
[0105] [0105] The computer system can additionally comprise a 716 processor to precisely reconstruct 3D models of different parts of the dento-maxillofacial structure (eg individual tooth, jaw and nerve) using the voxels classified by the neural learning network deep in 3D. As will be described in more detail below, part of the classified voxels, for example, the voxels that are classified as belonging to a tooth structure or a jaw structure, is inserted into an additional 3D deep learning neural network 720, which is trained to reconstruct 3D volumes for the dento-maxillofacial structures, for example, the shape of the jaw 724 and the shape of a tooth 726, based on the voxels that have been classified to belong to such structures. Other parts of the classified voxels, for example, the voxels that were classified by the 3D deep neural network as belonging to the nerves, can be post-processed using an interpolation function 718 and stored as 3D nerve data 722. The task to determine the volume that represents a nerve from the classified voxels is of a nature that may currently be beyond the capacity of (the processing power available to) a deep neural network. Furthermore, the classified voxels presented may not contain the information that would be suitable for a neural network to solve this problem. Therefore, to accurately and efficiently post classified nerve voxels, an interpolation of the classified voxels is used. After the post-processing of 3D data from the various parts of the dento-maxillofacial structure,
[0106] [0106] In CBCT scans, the radio density (measured in Hounsfield Units (HU)) is inaccurate because different areas in the scan appear with different gray scale values, depending on their relative positions in the organ being scanned. The HU measured from the same anatomical area with both CBCT and medical grade CT scanners are not identical and, therefore, are not reliable for determining site specific radiographically identified bone density.
[0107] [0107] Furthermore, dental CBCT systems do not employ a standardized system for scaling the gray levels that represent the reconstructed density values. These values are, as such, arbitrary, and do not allow the assessment of bone quality. In the absence of such a standardization, it is difficult to interpret the gray levels or impossible to compare the resulting values from different machines.
[0108] [0108] The structure of the teeth and bone of the jaw has a similar density, so that it is difficult for a computer to distinguish between the voxels that belong to the teeth and the voxel that belongs to a jaw. Additionally, CBCT systems are very sensitive to the so-called beam hardening that produces dark bands between two high-attenuation objects (such as metal or bone), with surrounding bright bands.
[0109] [0109] In order to make the 3D deep learning neural network robust against the aforementioned problems, the 3D neural network can be trained using a 738 module to make use of 3D models of represented parts of the dento-maxillofacial structure by the 3D image data. The training data in 3D 730 can be correctly aligned with a CBCT image displayed in 704 for which the associated target output is known (for example,
[0110] [0110] In order to tackle this problem, in an embodiment, the optically produced training data 730, that is, accurate 3D models of (parts of) the dento-maxillofacial structure can be used instead of, or at least in addition to, manually segmented training data. The dento-maxillofacial structures that are used to produce the training data can be scanned using a 3D optical scanner. Such 3D optical scanners are known in the technology and can be used to produce high-quality 3D jaw and tooth surface data. 3D surface data can include 3D 732 surface meshes that can be filled (determining which specific voxels are part of the volume covered by the mesh) and used by a voxel classifier
[0111] [0111] Additionally, during the training process, CT training data can be pre-processed by a resource puller 708, which can be configured to determine 3D positional resources. A dentomaxillofacial resource can encode at least the spatial information associated with one or more parts of the dentomaxillofacial structure with formed image. For example, in a modality, a manually generated 3D positional feature may include a 3D curve that represents (part of) the jaw bone, in particular, the dental arch, in the 3D volume containing the voxels. One or more weight parameters can be assigned to points along the 3D curve. The value of a weight value can be used to encode a translation in 3D space from voxel to voxel. Instead of incorporating, for example, a coded version of the original space in which the image stack is received, the coded space is specific to the dento-maxillofacial structures detected at the entrance. The resource extractor can determine one or more curves that approximate one or more curves of the jaw and / or teeth (for example, the dental arch) by examining the voxel values that represent radiation intensity or density and suitability values one or more curves (for example, a polynomial) through certain voxels. Derivatives of (parts of) dental arch curves from a 3D CT image data stack can be stored as a 710 positional feature mapping.
[0112] [0112] In another modality, such 3D positional features can, for example, be determined using a (trained) machine learning method, such as a deep 3D neural network that is trained to derive relevant information from of the entire 3D data set received.
[0113] [0113] Figure 8 represents a training flow chart of a deep neural network to classify the dento-maxillofacial 3D image data according to an embodiment of the invention. The training data are used to train a 3D deep learning neural network, so that it can automatically classify the voxels of a 3D CT scan of a dento-maxillofacial structure. As shown in this figure, a representation of a dento-maxillofacial complex 802 can be provided for the computer system. The training data may include a stack of CT 804 image data from a dento-maxillofacial structure and an associated 3D model, for example, 3D 806 data from optical scanning of the same dento-maxillofacial structure. Examples of such 3D CT image data and 3D optical scan data are shown in figures 9A and 3B. Figure 9A represents the DICOM slices associated with different planes of a 3D CT scan of a dento-maxillofacial structure, for example, an axial plane 902, a frontal or coronal plane 904 and the sagittal plane 906. Figure 9B represents the 3D optical scanning data of a dento-maxillofacial structure. The computer can form meshes of the 3D 808 surface of the dento-maxillofacial structure based on the optical scanning data. In addition, an 810 alignment function can be employed that is configured to align the 3D surface meshes with the CT 3D image data. After alignment, the representations of the 3D structures that are provided for the computer input use the same spatial coordinate system. Based on the CT image data and aligned 3D surface meshes, 3D positional features 812 and classified voxel data from the optically scanned 3D model 814 can be determined. The positional features and classified voxel data can then be provided for the entry of the deep neural network 816, together with the image stack 804.
[0114] [0114] Therefore, during the training phase, the 3D deep learning neural network receives CT 3D training data and positional resources extracted from CT 3D training data as input data and associated classified training voxels with CT 3D training data they are used as target data. An optimization method can be used to learn the ideal values of the deep neural network parameters by minimizing a loss function that represents the deviation of the deep neural network output from the target data (ie, the voxel data classified), which represents the desired output for a predetermined input. When the minimization of the loss function converges to a certain value, the training process can be considered suitable for the application.
[0115] [0115] The training process depicted in figure 8 using 3D positional features in combination with training voxels, which can be (at least partially) derived from 3D optical scan data, provides a high-quality training set quality for the 3D deep learning neural network. After the training process, the trained network is able to precisely classify the voxels from a stack of 3D CT image data.
[0116] [0116] Figures 10A and 10B represent high-level schematic representations of deep neural network architectures for use in methods and systems that are configured to classify and segment 3D voxel data from a dento-maxillofacial structure. As shown in figure 10A, the network can be implemented using 3D convolutional neural networks (3D CNNs). Convolutional layers can employ an activation function associated with neurons in the layers, such as a sigmoid function, a tanh function, a relu function, a softmax function, etc. A plurality of 3D convolutional layers can be used, where minor variations in the number of layers and their definition parameters, for example, different activation functions, kernel quantities and sizes, and additional functional layers, such as abandon layers, can be used. be used in the implementation, without losing the essence of the design of the deep neural network.
[0117] [0117] As shown in figure 10A, the network can include a plurality of convolutional paths, for example, a first convolutional path associated with a first set of 3D 1006 convolutional layers and a second convolutional path associated with a second set of convolutional layers 3D 1008. The 3D 1002 image data can be fed into the inputs of both the first and second convolutional paths. As described in relation to figure 4, in one embodiment, the 3D image data may comprise a plurality of channels, for example, an additional fourth dimension comprising the additional information, such as the 3D positional resource data.
[0118] [0118] Additionally, in some embodiments, the network may include at least one additional (third) convolutional path associated with a third set of 3D 1007 convolutional layers. The third convolutional path can be trained to encode 3D resources derived from data from 3D positional feature received associated with the voxels that are offered as a separate entry for the third path. This third convolution path can, for example, be used in the event that such 3D positional resource information is not offered as an additional image channel from the received 3D image data.
[0119] [0119] The function of the different paths is illustrated in more detail in figure 10B. As shown in this figure, the voxels that represent the 3D image data are fed into the neural network input. These voxels are associated with a predetermined volume, which can be referred to as the volume of the image 10011. Each of the subsequent 3D convolution layers of the first path 10031 can perform a 3D convolution operation on the first 10011 voxel blocks: of the 3D image data. During processing, the output of a 3D convolution layer is the input of a subsequent 3D convolution layer. In this way, each 3D convolutional layer can generate a 3D resource map that represents parts of the 3D image data that are fed into the input. A 3D convolutional layer that is configured to generate such feature maps can therefore be referred to as a CNN 3D feature layer.
[0120] [0120] As shown in figure 10B, the convolutional layers of the second path 10032 can be configured to process the second blocks of voxels 10012 of the 3D image data. Each second block of voxels is associated with a first block of voxels, where the first and second blocks of voxels have the same origin centered on the image volume. The volume of the second block is greater than the volume of the first block. Furthermore, the second block of voxels represents a downwardly used version of an associated first block of voxels. Downward sampling can be based on the use of a well-known interpolation algorithm. The downward sampling factor can be any appropriate value. In one embodiment, the downward sampling factor can be selected between 20 and 2, preferably between 10 and 3.
[0121] [0121] Therefore, the deep neural network in 3D can comprise at least two convolutional paths. A first convolutional path 10031: you can define a first set of CNN 3D resource layers (for example, 5-20 layers), which are configured to process the input data (for example, first blocks of voxels at predetermined positions in the image volume) of a first voxel resolution, for example, the target's voxel resolution (ie the resolution of the voxels of the 3D image data that will be classified). Similarly, a second convolutional path can define a second set of CNN 3D resource layers (for example, 5-20 layers), which are configured to process input data in a second voxel resolution (for example, second blocks of voxels) , where each block of the second voxel blocks 10012 has the same central point as its associated block from the first voxel block 10011). Here, the second resolution is lower than the first resolution. Therefore, the second blocks of voxels represent a greater volume in real-world dimensions than the first blocks. In this way, the first CNN 3D feature layers process the first voxel blocks in order to generate the 3D feature maps and the second CNN 3D feature layers process the second voxel blocks in order to generate the 3D feature maps. which include information about the (direct) neighborhood of the first associated blocks of voxels that are processed by the first layers of the CNN 3D feature.
[0122] [0122] The second path thus enables the neural network to determine contextual information, that is, information about the context (for example, its surroundings) of the voxels of the 3D image data that are presented for the input of the neural network . By using multiple convolutional (parallel) paths, both the 3D image data (the input data) and the contextual information about the voxels of the 3D image data can be processed in parallel. Contextual information is important for classifying dento-maxillofacial structures, which typically include closely packed dental structures that are difficult to distinguish. Especially in the context of the classification of individual teeth, it is important that at least as much information about the native resolution of the entry is available (containing at least the detailed information that considers the individual shape of the tooth), as well as contextual information (which it contains at least the information regarding the location in a dentition, the neighboring structures, such as other teeth, tissue, air, bone, etc.).
[0123] [0123] In one embodiment, a third convolutional path can be used for processing 3D positional features. In an alternative modality, instead of using a third convolutional path for processing 3D positional features, 3D positional information, including 3D positional features, can be associated with the 3D image data that is offered for the input of the deep neural network. In particular, a 3D data stack can be formed in which each voxel is associated with an intensity value and positional information. Thus, the positional information can be paired with the applicable received voxel, for example, by adding the 3D positional resource information as additional channels in the received 3D image information. Therefore, in this modality, a voxel of a 3D voxel representation of a 3D maxillofacial structure at the entrance of the deep neural network can not only be associated with a voxel value that represents, for example, a radio intensity value, but also with 3D positional information. Thus, in this modality, during the training of the convolutional layers of both the first and the second convolutional paths, the information derived both from the 3D image resources and from the 3D positional resources can be encoded in these convolutional layers. The output of the CNN 3D resource layer sets is then merged and fed to the input of a set of completely connected CNN 3D layers 1010, which are trained to derive the intended classification of the 1012 voxels that are offered at the neural network input and processed by the CNN 3D feature layers.
[0124] [0124] The completely connected layers can be configured in such a way that they are completely connected, considering the connections themselves that will be output voxels derived in an output voxel block. This means that they can be applied in a completely convolutional manner, as is known in technology, that is, the set of parameters associated with the completely connected layers is the same for each output voxel. This can lead to each output voxel in a voxel block being both trained and inferred in parallel. Such a configuration of the fully connected layers reduces the amount of parameters required for the network (compared with the densely completely connected layers for the whole of a block), while reducing both training and inference time (a set or block of voxels) is processed in one pass, instead of just a single output voxel).
[0125] [0125] CNN 3D resource layer sets can be trained (through their learnable parameters) to derive and pass the information used in an ideal way that can be determined from its specific input, and the completely connected layers encode parameters that will determine the way in which the information from the three previous paths should be combined to provide ideal classified voxels 1012. Subsequently, classified voxels can be presented in the space of the 1014 image. Therefore, the output of the neural network are voxels classified in a space of the image that corresponds to the space of the image of the voxels at the entrance.
[0126] [0126] Here, the output (the last layer) of the completely connected layers can provide a plurality of activations for each voxel. A voxel activation like this can represent a measure of probability (a prediction) that defines the probability that a voxel belongs to one of a plurality of classes, for example, classes of the dental structure, for example, an individual tooth, section of jaw and / or nerve structure. For each voxel, the voxel activations associated with different dental structures can be limited in order to obtain a classified voxel.
[0127] [0127] Figures 11-13 illustrate methods for determining 3D positional features in a 3D image data stack representing a dento-maxillary structure.
[0128] [0128] In order to determine the reference planes and / or reference objects in the image volume that are used in the classification process, the feature analysis function can determine the voxels of a predetermined intensity value or above or below a predetermined intensity value. For example, the voxels associated with bright intensity values can refer to teeth and / or jaw tissue. In this way, information about the position of the teeth and / or the jaw and the orientation (for example, a rotational angle) in the volume of the image can be determined by the computer. If the feature analysis function determines that the angle of rotation is greater than a predetermined amount (for example, greater than 15 degrees), the function can correct the angle of rotation to zero, as this is more beneficial for accurate results.
[0129] [0129] Figure 11 illustrates an example of a flow chart 1102 of a method of determining 3D positional resources manually generated in 3D image data 1104, for example, a stack of 3D CT image data. This process may include determining one or more 3D positional features of the dento-maxillofacial structure, in which one or more 3D positional features are configured to enter the specific path of the deep neural network (as discussed in relation to figure 10B exposed). A manually generated 3D positional feature defines the position information of the voxels in the image volume in relation to the reference planes or reference objects in the image volume, for example, a distance, for example, a perpendicular distance, between the voxels in the volume of the image and a reference plane in the image volume that separate the upper and lower jaws. It can also define the distance between the voxels in the image volume and a dental reference object, for example, a dental arch in the image volume. It can additionally define the positions of the accumulated intensity values in a second reference plane of the image volume, an accumulated intensity value at a point in the second reference plane that includes the accumulated intensity values dis voxels in or near the normal execution through the point on the reference plane. Examples of 3D positional features are described below.
[0130] [0130] In order to determine a reference object that provides positional information from the dental arch in the 3D image data of the dento-maxillofacial structure, an adequacy algorithm can be used to determine a curve, for example, a curve that follows a polynomial formula, which adapts pre-
[0131] [0131] In one embodiment, a point cloud of intensity values in an axial plane (an xy plane) of the image volume can be determined. An accumulated intensity value of a point in such an axial plane can be determined by adding the voxel values of the voxels positioned in the normal that flows through a point in the axial plane. The intensity values thus obtained in the axial plane can be used to find a curve that approximates a dental arch to the teeth.
[0132] [0132] An example of a reference object for use in determining manually generated 3D positional features, in this case, a curve that approximates a dental arch like this, is provided in figure 12. In this example, a point cloud in the axial plane (xy) indicating areas of high intensity values (bright white areas) can indicate areas of tooth or jaw structures. In order to determine a dental arch curve, the computer can determine the areas on an axial plane of the image volume associated with shiny voxels (for example, voxels that have an intensity value above a predetermined threshold value) that can be identified as voxels of the teeth or jaw. These high-intensity areas can be used to determine a growing array of bright areas that approach the dento-maxillofacial arch. In this way, a curve of the dental arch can be determined, which approximates an average of the dento-maxillo-facial arches of the upper and lower jaws, respectively. In another embodiment, separate dental arch curves associated with the upper and lower jaws can be determined.
[0133] [0133] Different features can be defined based on a curve (or figures 13A-13D represent examples of positional features of 3D image data according to various modalities of the invention).
[0134] [0134] Figure 13A represents (left) an image of a slice of the sagittal plane of a pile of 3D image data and (right) an associated visualization of a so-called height feature of the same slice. Such a height feature can encode a z position (a height 1304) of each voxel in the image volume of the 3D CT image data stack relative to a reference plane 1302. The reference plane (for example, the axial plane or xy which is determined as the best approximation of) the xy plane with approximately equal distance to both the upper and lower jaws and their constituent teeth.
[0135] [0135] Other 3D positional features can be defined to encode spatial information in an xy space of a stack of 3D image data. In one embodiment, such a positional feature may be based on a curve that approximates (part of) the dental arch. A positional feature like this is illustrated in figure 13B, which represents (left) a slice from a stack of 3D image data and (right) a visualization of the so-called displacement feature for the same slice. This displacement feature is based on the curve that approximates the dental arch 1306 and defines the relative distance 1308 measured along the curve. Here, zero distance can be defined as the point 1310 on the curve where the derivative of the second degree polynomial is (approximately) zero. The displaced distance increases during movement in any direction on the geometric axis x, from this point (for example, the point where the derivative is zero).
[0136] [0136] An additional 3D positional feature based on the dental arc curve can define the shortest (perpendicular) distance of each voxel in the image volume to the 1306 dental arc curve. This positional feature can therefore be referred to as the distance feature. An example of such a feature is provided in figure 13C, which represents (left) a slice of the 3D image data stack and (right) a view of the distance feature for the same slice. For this feature, the zero distance means that the voxel is positioned on the curve of the dental arch 1308.
[0137] [0137] An additional 3D positional feature can also define positional information for individual teeth. An example of such a feature (which can also be referred to as a dental feature) is provided in figure 13D, which represents (left) a slice of the 3D image data stack and (right) a view of the dental feature for the same slice. The dental resource can provide the information to be used to determine the probability of finding voxels of certain teeth at a certain position in the voxel space. This feature can, following a determined reference plane, such as 1302, encode a separate sum of voxels through the normal with respect to any plane (for example, the xy plane or any other plane). This information thus provides the neural network with a “view” of all the information from the original space added through the normal of the plane. This view is larger than what would be processed when this feature is excluded, and may provide a means of differentiating whether a hard structure is present based on all the information in the chosen direction of space (as illustrated in 131212 for the xy plane ).
[0138] [0138] Therefore, figures 11-13 show that a 3D positional feature defines the voxel information of a voxel representation that is provided for the input of a deep neural network that is trained to classify the voxels. The information can be aggregated from all (or a substantial part of) information available from the voxel representation in which, during aggregation, the position of a voxel in relation to a dental reference object can be considered. Additionally, the information is aggregated in such a way that it can be processed by position of a voxel in the first voxel representation.
[0139] [0139] Figures 14A-14D represent examples of the output of a deep learning neural network trained according to an embodiment of the invention. In particular, figures 14A-14D represent 3D images of voxels that are classified using a deep learning neural network that is trained using a training method described in relation to figure 8. Figure 14A represents a 3D computer renderer (rendering ) of the voxels that the deep learning neural network classified as individual teeth, individual jaw and nerve tissue. Voxels can be classified by the neural network into voxels that belong to individual tooth structures in figure 14B, individual jaw structures in figure 14C or nerve structures in figure 14D. The individual voxel representations of the structures, resulting from the deep neural network, were marked as such in the figures. For example, figure 14B shows the individual tooth structures that have been transmitted, here labeled with their FDI tooth label index (quadrant four 4x index labels, have been omitted for clarity in the figure). As shown by figures 14B-14D, the classification process is accurate, but there are still a large number of voxels that are missing or that are erroneously classified. For example, voxels that have been classified as a FDI 37 tooth index label contain a structural extension 1402 that does not accurately represent the tooth structure in the real world. Similarly, voxels classified as FDI 38 tooth index label produce surface imperfection 1404. Note, however, that the network has classified the vast majority of voxels for this tooth, although it is only partially present in the image data set 3D received. As shown in figure 14D, these problems can be even more pronounced with the classified nerve voxels, which are missing parts 1406 present in the real world nerve.
[0140] [0140] In order to address the problem of discrepancies in classified voxels (which form the output of the first deep learning neural network), voxels can be post-processed. Figure 15 represents a post-processing flowchart of the classified voxels of the 3D maxillofacial structures according to an embodiment of the invention. In particular, figure 15 represents a post-processing flowchart of the voxel data of the dento-maxillofacial structures that are classified using a deep learning neural network, as described in relation to figures 7-14 of this application.
[0141] [0141] As shown in figure 15, the process may include a step of dividing the classified voxel data 1502 of the 3D maxillofacial structure into 3D voxels which are classified as individual jaw voxels 1504, individual tooth voxels 1506 and voxels that are classified as 1508 nerve data.
[0142] [0142] The postprocessing deep learning neural network encodes representations of both classified teeth and the jaw (sections). During training of the post-processing deep learning neural network, the parameters of the neural network are tuned in such a way that the output of the first deep learning neural network is translated into the most doable 3D representation of these dento-maxillofacial structures. In this way, the imperfections in the classified voxels can be reconstructed 1512. Additionally, the surface of the 3D structures can be uniformized 1514 so that the best feasible 3D representation can be generated. In one embodiment, the omission of the CT 3D image data stack from being a source of information for the post-processing neural network makes this post-processing step robust against unwanted variances in the image stack.
[0143] [0143] Due to the nature of the (CB) CT images, the output of the first deep learning neural network will suffer from (aforementioned) potential artifacts, such as weighting due to patient movement, beam hardening, etc. Another source of noise is the variance in the image data captured by different CT scanners. This variance results in several factors being introduced, such as varying amounts of noise in the image stack, variable intensity voxel values that represent the same density (in the real world) and potentially others. The effects that the aforementioned artifact and noise sources have on the output of the first deep learning neural network can be removed or at least substantially reduced by the post processing deep learning neural network, leading to segmented jaw voxels and segmented teeth.
[0144] [0144] Classified nerve data 1508 can be post-processed separately from the jaw and teeth data. The nature of nerve data, which represents long, thin filament structures in the CT image data stack, makes this data less suitable for post-processing by a deep learning neural network. Instead, classified nerve data is post-processed using an interpolation algorithm in order to perform the 1516 segmented nerve data procedure. For this purpose, voxels that are classified as nerve voxels and that are associated with a high probability (for example, a probability of 95% or more) are used by the fit algorithm to build a 3D model of nerve structures. Subsequently, the 3D 1518 jaw, teeth and nerve data sets can be processed in respective 3D models of the dento-maxillofacial structure.
[0145] [0145] Figure 16 represents an example of an architecture of a deep learning neural network that is configured for the post-processing of the classified voxels of a 3D maxillofacial structure according to a modality of the invention. The postprocessing deep learning neural network may have an architecture that is similar to the first deep learning neural network, including a first path formed by a first set of CNN 3D 1604 resource layers, which is configured to process data from input (in this case, a portion of the classified voxel data) to the target resolution. The deep learning neural network additionally includes a second set of CNN 3D 1606 resource layers, which is configured to process the context of the input data that is processed by the first CNN 3D resource layers, but then at a lower resolution than target. The outputs of the first and second layers of the CNN 3D feature are then fed to the input of a set of completely connected CNN 3D layers 1608 in order to reconstruct the classified voxel data, in such a way that they closely represent a model in 3D of the dento-maxillofacial structure in 3D. The output of the fully connected CNN 3D layer provides the reconstructed voxel data.
[0146] [0146] The post-processing neural network can be trained using the same targets as the first deep learning neural network, which represent the same desired output. During training, the network is made as widely applicable as possible by providing noise to the inputs to represent the exceptional cases that will be regularized. Inherent in the nature of the post-processing deep learning neural network, its processing also results in the removal of non-feasible aspects of the received voxel data. The factors exposed here include the standardization and filling of the desired dento-maxillofacial structures, and the permanent removal of non-feasible voxel data.
[0147] [0147] Figures 17A and 17B represent the processing that results in volume reconstruction and interpolation of the voxels classified according to an embodiment of the invention. In particular, figure 17A represents a figure of the classified voxels of tooth and nerve structures, in which voxels are the output of the first deep learning neural network. As shown in the figure, noise and other artifacts in the input data result in irregularities and artifacts in the voxel classification and therefore in the 3D surface structures that include gaps in the voxel sets that represent a tooth structure. These irregularities and artifacts are especially visible in the structure of the lower alveolar nerve and in the structures of the dental root of the teeth, as also indicated in relation to figure 14B and figure 14D.
[0148] [0148] Figure 17B represents the result of post-processing according to the process described in relation to figure 15 and figure 16. As shown in this figure, the post-processing deep learning neural network successfully removes artifacts that were present in the input data (the classified voxels). The post-processing step successfully reconstructs parts that have been substantially affected by irregularities and artifacts, such as the root structures 1702 of the teeth that now display uniform surfaces that provide an accurate 3D model of the individual tooth structures. High-probability nerve voxels (for example, a probability of 95% or more) can be used by a fit algorithm to build a 3D model of 1704 nerve structures. Also note that imperfections with respect to labels FDI tooth index numbers 37 and 38, as shown in relation to figure 14B, were also corrected 1706.
[0149] [0149] Figure 18 represents a schematic representation of a computer system distributed according to one embodiment of the invention. The distributed computer system can be configured to process 3D data based on trained 3D deep learning processors, as described in this order, and to render the processed 3D data. As shown in figure 18, 3D deep learning processors trained to segment the dento-maxillofacial structures of 3D data into individual 3D tooth models and to classify tooth models into tooth types can be part of a distributed system comprising one or more 1802 servers on the network and multiple terminals 181013, preferably mobile terminals, for example, a desktop computer, a laptop, an electronic tablet, etc. The 3D deep learning processors (trained) can be implemented as 1804, 1806 server applications. In addition, an 18121-3 client application (a client device) running on the terminals can include a user interface that enables a user to interact with the system and a network interface that enables client devices to communicate through one or more 1808 networks, for example, the Internet, with server applications. A client device can be configured to receive input data, for example, 3D (CB) CT data representing a dento-maxillofacial structure comprising a dentition or individual 3D tooth models that form a dentition. The client device can transmit the data to the server application, which can process (pre-process, segment, classify and / or process) the data based on methods and systems, as described in this application. The processed data, for example, taxonomized (labeled) 3D image data of the tooth, can be sent back to the client device and an 18141-3 rendering engine associated with the client device can use the processed 3D image data sets of individual 3D tooth models labeled to render 3D tooth models and label information, for example, in the form of a dental chart or the like. In another modality, part of the data processing can be performed on the client side. For example, the pre-processing and / or post-processing described in this disclosure can be performed by the client device. In additional embodiments, instead of a distributed computer system, a central computer system can be used to perform the pre-processing, post-processing and classification processes described in this application.
[0150] [0150] Therefore, as shown in figure 18, the invention provides a completely automated path for taxonomy of 3D tooth models. A user can provide 3D image data, for example, 3D (CB) CT data, including voxels that represent a dentition or a dento-maxillofacial structure comprising a dentition, for the entrance of the system and, in response, the system will generate individually labeled 3D tooth objects, which can be presented to the user in different graphic formats, for example, as a 3D rendering or as marking on displayed image slices. The input data is automatically optimized for entry into the 3D deep neural network, so that the processors of the 3D deep neural network are able to accurately process the CT 3D image (CB) data without any human intervention. In addition, the invention allows 3D rendering of the output generated by the processors of the deep neural network in 3D, that is, the individually labeled 3D teeth of a dentition. Such visual information is indispensable for state-of-the-art dental applications in dental care and dental reporting, orthodontics, orthognathic surgery, forensic investigation, biometrics, etc.
[0151] [0151] Figure 19 represents an example of a processed set of teeth resulting from a system described in relation to figure 7, including labels applied to the 3D data sets of the teeth by a deep neural network that classifies the individual teeth, and labels applied to a dentition resulting from post-processing. In particular, figure 19 represents per tooth, before the dash symbol (for example, 1902), the label with the highest activation value for the 3D data set for the individual tooth as resulting from classification using a grid deep learning neural that is trained using a training method described in relation to figure 5. In addition, figure 19 represents, per tooth, after the dash symbol (for example, 1904), the label assigned to the individual tooth in the next resolved candidate state the post-processing described in relation to figure 6. The classification labels that are represented in red (for example, 1906) will be incorrectly classified when considering only the labels with the highest activation resulting from the deep learning network of the individual tooth. They can be classified incorrectly due, for example, to an insufficiently trained deep learning network or exceptions in the input data. In this example, the labels, such as 1904, show the results of the dentition taxonomy that used post-processing, which optimized the highest activations assigned while satisfying the condition of each 3D data set representing an individual tooth that is assigned with a unique label.
[0152] [0152] Figures 20A and 20B represent the rendered dentitions that comprise the 3D labeled tooth models generated by a computer system according to an embodiment of the invention. These rendered teeth can be, for example, generated by a computer system distributed in the manner described in relation to figures 18A and 18B. Figure 20A represents a first rendered dentition 2000: which includes 3D tooth models individually labeled 2002, in which individual 3D tooth models can be generated based on a 3D CBCT data stack that has been fed into the computer system input. (as described in relation to figures 7-17). As described in relation to figure 18, the computer system can include a 3D deep learning processor configured to generate an individually identified 3D tooth model, for example, in the form of 3D surface meshes, which can be fed into the input of processors that are configured to perform a taxonomy process to classify (label) 3D tooth models (for example, as described in relation to figure 3).
[0153] [0153] The processor of the 3D trained deep neural network of this computer system can classify the 3D tooth data of the dentition into the applicable tooth types that can be used, for example, in a 2006 electronic dental chart that includes the 32 possible teeth of an adult. As shown in the figure, a dental chart like this can include an upper set of teeth that are spatially arranged according to an upper dental arch 2008: and a lower set of teeth that are spatially arranged according to the lower dental arch 20082. After the taxonomy process, each of the 3D tooth models derived from the voxel representations can be labeled with a tooth type and associated with a position on the dental map. For example, the automated taxonomy process can identify a first 3D tooth object 20041 as an upper left central incisor (identified in the dental chart as a type 21 tooth 20101) and a second 3D tooth object 20042 as a cusp (identified in the chart dental as a type 23 tooth 20102).
[0154] [0154] During the taxonomy of all individual 3D tooth models in a 3D data set, the computer can also determine that some teeth are missing (for example, the left and upper right third molar and the left lower third molar) . In addition, the slices of the 3D input data representing the dento-maxillofacial structure can be rendered, for example, a slice from the axial plane 2012 and a slice from the sagittal plane 2016. Because the process includes the classification of the voxels of the 3D input data in different parts of the dento-maxillofacial structure (for example, individual jaw sections, individual teeth or individual nerve), the computer system knows which voxels in the 3D data stack belong to an individual tooth. In this way, the computer can refer directly to one or more 3D tooth objects 20041 ,, to pixels in slices so that these pixels can be easily selected and highlighted, for example, highlighted pixels 20141,2 and 2018, and / or hidden. Figure 20B represents the rendered dentition 20002 which includes 3D tooth objects labeled 2022 which is similar to figure 20A. The individual 3D tooth models 202412 can be labeled using a dental chart 2026 and / or slices 2032, 2036 that provide both visual information about the position and type of tooth, and the ability to show / hide the labeled models of the tooth models 3D classified and on tooth types 20301.2. For example, as shown in figure 20B, the system can allow selection of tooth type 22 20302 and hide the associated 3D tooth model in the 3D render of the dentition.
[0155] [0155] Figure 21 is a block diagram that illustrates the exemplary processing of the data systems described in this disclosure. The processing of data system 2100 can include at least one processor 2102 coupled to memory elements 2104 via a system bus 2106. As such, the data processing system can store the program code in memory elements 2104. Additionally , processor 2102 can execute the program code accessed from memory elements 2104 via system bus 2106. In one aspect, the data processing system can be implemented as a computer that is suitable for storing and / or execute the program code. It should be noted, however, that the 2100 data processing system can be implemented in the form of any system that includes a processor and a memory that is capable of performing the functions described in this specification.
[0156] [0156] Memory elements 2104 can include one or more physical memory devices, such as, for example, local memory 2108 and one or more mass storage devices 2110. Local memory can refer to random access memory or other non-persistent memory device (s), in general, used during the actual execution of the program code. A mass storage device can be implemented as a hard drive or other persistent data storage device. The 2100 processing system may also include one or more cache memories (not shown) that provide temporary storage of at least some program code in order to reduce the number of times that the program code must be retrieved from the storage device en mass 2110 during execution.
[0157] [0157] The input / output (I / O) devices represented as the input device 2112 and the output device 2114 can optionally be coupled to the data processing system. Examples of input devices may include, but are not limited to, for example, a keyboard, a pointing device, such as a mouse, or the like. Examples of an output device may include, but are not limited to, for example, a monitor or display, speakers or the like. The input device and / or the output device can be coupled to the data processing system either directly or via intervening 1 / O controllers. A 2116 network adapter can also be coupled to the data processing system to enable it to be coupled to other systems, computer systems, remote network devices, and / or remote storage devices via intervening private or public networks. The network adapter may comprise a data receiver for receiving the data that is transmitted by said systems, devices and / or networks for said data and a data transmitter for transmitting the data for said systems, devices and / or networks. Modems, cable modems, and Ethernet cards are examples of different types of network adapters that can be used with the 2100 data processing system.
[0158] [0158] From the drawing drawn in figure 21, the memory elements 2104 can store an application 2118. It should be noted that the data processing system 2100 can additionally run an operating system (not shown) that can facilitate the execution of the application. The application, which is implemented in the form of executable program code, can be executed by the data processing system 2100, for example, by the 2102 processor. Responsive to the execution of the application, the data processing system can be configured to perform a or more operations that will be described here with additional details.
[0159] [0159] In one aspect, for example, the 2100 data processing system may represent a client data processing system. In this case, the 2118 application can represent a client application that, when executed, configures the 2100 data processing system to perform the various functions described here in relation to a “client”. Examples of a customer may include, but are not limited to, a personal computer, a portable computer, a cell phone, or the like.
[0160] [0160] In another aspect, the data processing system can represent a server. For example, the data processing system can represent a server (HTTP), in which case application 2118, when run, can configure the data processing system to perform server operations (HTTP). In another aspect, the data processing system can represent a module, unit or function referred to in this specification.
[0161] [0161] The terminology used here is for the purpose of describing particular modalities only, and is not intended to be limiting of the invention. As used here, it is intended that the singular forms "one", "one", "o" and "a" also include plural forms, unless the context clearly indicates otherwise. It will be further understood that the terms "comprises" and / or "comprising", when used in this specification, specify the presence of declared resources, integers, steps, operations, elements and / or components, but do not prevent the presence or addition of one or more other resources, integers, steps, operations, elements, components, and / or groups thereof.
[0162] [0162] It is intended that the corresponding structures, materials, acts and equivalents of all means or stages plus function elements in the following claims include any structure, material or act to perform the function in combination with other elements specifically claimed claimed. The description of the present invention has been presented for the purposes of illustration and description, but is not intended to be exhaustive or limited to the invention in the disclosed form. Many modifications and variations will be apparent to those skilled in the art without departing from the scope and spirit of the invention. The modality was chosen and described in order to better explain the principles of the invention and its practical application, and to enable others skilled in the art to understand the invention for various modalities with various modifications that are suitable for the particular use contemplated.

权利要求:
Claims (14)
[1]
1. Computer-implemented method for processing 3D data representing a dento-maxillofacial structure, characterized by comprising: a computer receiving 3D data, preferably CT data in 3D conical beam, CBCT, 3D data including a voxel representation of dento-maxillofacial structure, the dento-maxillofacial structure comprising a dentition, a voxel at least being associated with a radiation intensity value, the voxels of the voxel representation defined an image volume; the computer provides the voxel representation for the entry of a first 3D deep neural network, the 3D deep neural network being trained to classify the voxels of the voxel representation into one or more classes of tooth, preferably in at least 32 classes of tooth dentition tooth; the first deep neural network comprising a plurality of first 3D convolutional layers that define a first convolution path and a plurality of second 3D convolutional layers that define a second convolutional path parallel to the first convolutional path, the first convolutional path configured to receive at your entrance a first block of voxels from the voxel representation and the second convolutional path being configured to receive a second block of voxels from the voxel representation, the first and second blocks of voxels having the same or substantially the same central point in the volume of the image and the second block of voxels representing a volume in dimensions of the real world that is greater than the volume in dimensions of the real world of the first block of voxels, the second convolutional path determining contextual information for the voxels of the first block of voxels; the output of the first and second convolutional paths being connected in at least one layer completely connected to classify the voxels of the first block of voxels in one or more classes of tooth; and, the computer receives classified voxels of the voxel representation of the dento-maxillofacial structure from the output of the first deep neural network in
3D.
[2]
Method according to claim 1, characterized in that the volume of the second block of voxels is greater than the volume of the first block of voxels, the second block of voxels representing a downwardly worn version of the first block of voxels, preferably the descending sampling factor being selected between 20 and 2, more preferably between 10 and 3.
[3]
Method according to claim 1 or 2, characterized in that it further comprises: the computer determines one or more individual tooth voxel representations of the dento-maxillofacial structure based on classified voxels; the computer provides each of the one or more voxel representations of the individual tooth for the entry of a second 3D deep neural network, the second 3D deep neural network being trained to classify a voxel representation of an individual tooth into one of a plurality of tooth classes in a dentition, each tooth class being associated with a candidate tooth class label, the second trained 3D neural network generating, for each of the candidate tooth class labels, an activation value, a activation value associated with a candidate tooth class label defining the probability that a voxel representation of an individual tooth represents a tooth class indicated by the candidate tooth class label.
[4]
Method according to any one of claims 1-3, characterized in that it further comprises: determining a dentition taxonomy that includes: defining candidate dentition states, each candidate state being formed by assigning a candidate tooth class label each of a plurality of voxel representations of the individual tooth based on activation values; and, evaluate candidate dentition states based on one or more conditions, at least one of the one or more conditions requiring different candidate tooth class labels assigned with different voxel representations of the individual tooth.
[5]
Method according to any one of claims 144, characterized in that it further comprises: the computer using a pre-processing algorithm to determine the 3D positional resource information of the dento-maxillofacial structure, the 3D positional resource information defining , for each voxel in the voxel representation, information about the position of the voxel in relation to the position of a dental reference object, for example, a jaw, a dental arch and / or one or more teeth, in the image volume, and ; the computer adds the 3D positional resource information to the 3D data before providing the 3D data for the input of the first deep neural network, the 3D positional resource information added providing an additional data channel for the 3D data.
[6]
6. Method according to any one of claims 1-5, characterized in that it further comprises: the computer post-processing the voxels classified by the first 3D deep neural network based on a third trained neural network, the third deep neural network being trained to receive voxels that are classified by the first deep neural network at its entrance and to correct voxels that are incorrectly classified by the first deep neural network, preferably the third neural network being trained based on the voxels that are classified during the training of the first neural network deep as input and based on one or more 3D data sets from the parts of the dento-maxillofacial structures of the 3D image data of the training set as a target.
[7]
7. Computer-implemented method for training a deep neural network system to process 3D image data from a dento-maxillofacial structure, characterized by comprising: a computer receiving training data, training data including: 3D input, preferably 3D conical beam CT (CBCT) data, the 3D input data defining one or more voxel representations of one or more dento-maxillofacial structures, respectively, a voxel being associated with a value of radiation intensity, the voxels of a voxel representation defining an image volume; and, the training data additionally including: 3D data sets of parts of the dentomaxillofacial structures represented by the 3D input data of the training data; the computer uses a pre-processing algorithm to determine the 3D positional resource information of the dento-maxillofacial structure, the 3D positional resource information defining, for each voxel in the voxel representation, the information about the position of the voxel in relation to the position of a dental reference object, for example, a mandible, a dental arch and / or one or more teeth, in the image volume; and, use training data and one or more 3D positional resources to train the first deep neural network to classify voxels in one or more tooth classes, preferably in at least 32 tooth classes in a dentition.
[8]
8. Method according to claim 7, characterized in that it further comprises: using the voxels that are classified during the training of the first deep neural network and the one or more 3D data sets of parts of the dento-maxillofacial structures of the data 3D image of the training set to train a second neural network to post-process voxels classified by the first deep neural network, where post-processing by the third neural network includes correcting voxels that are incorrectly classified by the first deep neural network.
[9]
9. Method according to claim 8, characterized in that it further comprises: using the 3D data sets, which are voxel representations of individual teeth to be used as targets for training at least the first deep neural network, to select a subset of voxels at least from the 3D image data that are used as training input to the first deep neural network, the subset being used as training input to a third deep neural network; and, use the tooth class label as associated with the 3D data set that serves as a target for training at least the first deep neural network as the target tooth class label for training the third deep neural network.
[10]
10. Computer system, preferably a server system, adapted to automatically classify the 3D image data of the teeth, characterized by comprising: a computer-readable storage medium that has computer-readable program code in it, the program code including a classification algorithm and a deep neural network, the computer-readable program code; and a processor, preferably a microprocessor, coupled to the computer-readable storage media, where, responsive to the execution of the first computer-readable program code, the processor is configured to perform executable operations that include: receiving the 3D image data, preferably the 3D conical beam CT (CBCT) image data, the 3D image data defining a voxel image volume, a voxel being associated with an intensity value or a radiation density value, the voxels defining a representation 3D of the dento-maxillofacial structure in the image volume, the dento-maxillofacial structure including a dentition; a trained deep neural network to receive 3D image data at its input and classify at least part of the voxels in the image volume in one or more tooth classes, preferably in at least 32 teeth of a dentition.
[11]
11. Computer system, preferably a server system, adapted to automatically taxonomize the 3D image data of the teeth,
characterized by comprising: a computer-readable storage medium that has computer-readable program code incorporated into it, the program code including a taxonomy algorithm and a trained deep neural network, the computer-readable program code; and a processor, preferably a microprocessor, coupled to the computer-readable storage media, where, responsive to the execution of the first computer-readable program code, the processor is configured to perform executable operations that comprise: receiving 3D image data, preferably 3D conical beam CT (CBCT) data, 3D image data defining a voxel image volume, a voxel being associated with an intensity value or a radiation density value, the voxels defining a 3D representation of the dento-maxillofacial structure in the image volume, the dento-maxillofacial structure including a dentition; a trained deep neural network to receive 3D image data at its input and classify at least part of the voxels in the image volume into at least one or more tooth classes, preferably in at least 32 tooth classes of a dentition; and, determining a dentition taxonomy that includes defining candidate dentition states, each candidate state being formed by assigning a candidate label to each of the plurality of 3D image data sets based on activation values; and, assessing candidate states based on one or more conditions, at least one of the one or more conditions requiring different candidate tooth labels assigned with different sets of 3D image data.
[12]
12. Computer system, preferably a server system, adapted to automatically taxonomize the 3D image data of the teeth, characterized by comprising: a computer-readable storage medium that has computer-readable program code in it, the program code including a taxonomy algorithm and trained deep neural networks, computer-readable program code; and a processor, preferably a microprocessor, coupled to the computer-readable storage media, in which, responsive to the execution of the first computer-readable program code, the processor is configured to perform executable operations that include: receiving 3D image data, preferably 3D conical beam CT (CBCT) data, 3D image data defining a voxel image volume, a voxel being associated with an intensity value or a radiation density value, the voxels defining a 3D representation of the dento-maxillofacial structure in the image volume, the dento-maxillofacial structure including a dentition; a first trained deep neural network to receive 3D image data at its input and classify at least part of the voxels in the image volume into at least one or more tooth classes, preferably in at least 32 teeth of a dentition; a second trained deep neural network receiving the results of the first trained deep neural network and classifying the subsets by individual tooth of the voxel representations received on individual labels for the tooth classes; and, determining a dentition taxonomy that includes defining candidate dentition states, each candidate state being formed by assigning a candidate label to each of the plurality of 3D image data sets based on activation values; and, evaluate candidate states based on one or more conditions, at least one of the one or more conditions requiring different candidate tooth class labels assigned with different 3D image data sets.
[13]
13. Client device, preferably a mobile client device, adapted to communicate with a server system, the server system being adapted to automatically taxonomize the 3D image data of the teeth, as defined in claims 10-12, characterized in that the client device comprises: a computer-readable storage media that has computer-readable program code embedded in it, and a processor, preferably a microprocessor, attached to the computer-readable storage media and attached to a display device, in which, responsive to the execution of the first computer-readable program code, the processor is configured to perform executable operations that include: transmitting 3D image data, preferably CT image data in 3D conical beam (CBCT), 3D image data defining an image volume of voxels, a voxel being associated with an intensity value or a radiation density value, the v oxels defining a 3D representation of the dento-maxillofacial structure in the image volume, the dento-maxillofacial structure including a dentition; request that the server system segment, classify and taxonomize the 3D image data of the teeth; receiving a plurality of 3D image data sets, each 3D image data set defining a voxel image volume, the voxels defining a 3D tooth model in the image volume; the plurality of 3D image data sets forming the dentition; receiving one or more tooth class labels associated with one or more 3D image data sets; and, rendering one or more sets of 3D image data and one or more associated tooth class labels on a display.
[14]
14. Computer program product, characterized by comprising pieces of software code configured to, when executed in the memory of a computer, perform the steps of the method as defined in any of claims 1-9.

类似技术:

公开号 | 公开日 | 专利标题

BR112020006544A2|2020-09-29|automated classification and taxonomy of 3d tooth data using deep learning methods

BR112019028132A2|2020-07-28|computer system adapted to process 3d image data and its method, product of computer program

BR112020012292A2|2020-11-24|automated prediction of 3d root format using deep learning methods

US20210322136A1|2021-10-21|Automated orthodontic treatment planning using deep learning

Lösel et al.2020|Introducing Biomedisa as an open-source online platform for biomedical image segmentation

CN109003267B|2021-07-30|Computer-implemented method and system for automatically detecting target object from 3D image

KR20210104777A|2021-08-25|Automatic Semantic Segmentation of Non-Euclidean 3D Datasets Using Deep Learning

Kodym et al.2021|Segmentation of defective skulls from ct data for tissue modelling

CN112150472A|2020-12-29|Three-dimensional jaw bone image segmentation method and device based on CBCT | and terminal equipment

US20210174543A1|2021-06-10|Automated determination of a canonical pose of a 3d objects and superimposition of 3d objects using deep learning

US11080895B2|2021-08-03|Generating simulated body parts for images

Farheen et al.2021|Segmentation of Lung Tumor from CT Images using Deep Supervision

Marte2004|Model driven segmentation and the detection of bone fractures

同族专利:

公开号 | 公开日

WO2019068741A3|2019-05-16|

US20200320685A1|2020-10-08|

KR20200108822A|2020-09-21|

WO2019068741A2|2019-04-11|

EP3692463A2|2020-08-12|

EP3462373A1|2019-04-03|

CN111328397A|2020-06-23|

JP2020535897A|2020-12-10|

CA3078095A1|2019-04-11|

IL273646D0|2020-05-31|

引用文献:

公开号 | 申请日 | 公开日 | 申请人 | 专利标题

US10032271B2|2015-12-10|2018-07-24|3M Innovative Properties Company|Method for automatic tooth type recognition from 3D scans|KR102061408B1|2017-03-24|2019-12-31|제이엘케이인스펙션|Apparatus and method for analyzing images using semi 3d deep neural network|

CN108986123A|2017-06-01|2018-12-11|无锡时代天使医疗器械科技有限公司|The dividing method of tooth jaw three-dimensional digital model|

US11270523B2|2017-11-29|2022-03-08|Sdc U.S. Smilepay Spv|Systems and methods for constructing a three-dimensional model from two-dimensional images|

US11227436B2|2018-01-16|2022-01-18|Sony Corporation|Information processing apparatus and information processing method|

KR102082970B1|2019-04-29|2020-02-28|주식회사 루닛|Normalization method for machine-learning and apparatus thereof|

KR102232043B1|2019-05-09|2021-03-25|주식회사 디오|Method and Apparatus for Creating Model for Separating of Teeth and Alveolar Bone in Image Data|

US11093803B2|2019-06-14|2021-08-17|International Business Machines Corporation|Screening technique for prohibited objects at security checkpoints|

US11106930B2|2019-06-14|2021-08-31|International Business Machines Corporation|Classifying compartments at security checkpoints by detecting a shape of an object|

CN110243827B|2019-07-18|2020-11-20|华中科技大学|Rapid three-dimensional imaging method suitable for light transparent sample|

WO2021095867A1|2019-11-15|2021-05-20|国立大学法人東京大学|Automated surgery planning system, surgery planning method, and program|

US10916053B1|2019-11-26|2021-02-09|Sdc U.S. Smilepay Spv|Systems and methods for constructing a three-dimensional model from two-dimensional images|

EP3832594A1|2019-12-02|2021-06-09|Koninklijke Philips N.V.|A method and system for processing medical images|

WO2022038134A1|2020-08-17|2022-02-24|3Shape A/S|System and method for scanning a dental object|

CN112308867A|2020-11-10|2021-02-02|上海商汤智能科技有限公司|Tooth image processing method and device, electronic equipment and storage medium|

US11191618B1|2021-01-06|2021-12-07|Arkimos Ltd|Systems and methods for forming a dental appliance|

CN112750124B|2021-01-22|2021-11-09|推想医疗科技股份有限公司|Model generation method, image segmentation method, model generation device, image segmentation device, electronic equipment and storage medium|

CN113222994B|2021-07-08|2021-10-01|北京朗视仪器股份有限公司|Three-dimensional oral cavity model Ann's classification method based on multi-view convolutional neural network|

法律状态:
2021-11-23| B350| Update of information on the portal [chapter 15.35 patent gazette]|

优先权:

申请号 | 申请日 | 专利标题

EP17194460.6A|EP3462373A1|2017-10-02|2017-10-02|Automated classification and taxonomy of 3d teeth data using deep learning methods|

EP17194460.6|2017-10-02|

PCT/EP2018/076871|WO2019068741A2|2017-10-02|2018-10-02|Automated classification and taxonomy of 3d teeth data using deep learning methods|

[返回顶部]